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RESEARCH VERSUS PROPAGANDA IN VISUAL 
EDUCATION 


FRANK N. FREEMAN 
University of Chicago 


An educational movement usually goes through three stages. The 
first stage in the adoption of new materials or methods is characterized 
by undiscriminating propaganda. This propaganda awakens a high 
degree of enthusiasm on account of its plausibility and the deficient 
criticism to which it is subjected. The enthusiasm with which the 
new movement is met leads to widespread adoption of the new 
devices. 

The second stage of the movement is one of reaction and of decline. 
As the new device is subjected to the criticism which is derived from 
the experience of many teachers, it becomes evident that the claims 
which were first advanced for it were greater than could be justified. 
The reaction which ensues, as is usual in social movements, proceeds 
beyond the point of equilibrium in the opposite direction, and the 
movement falls into disfavor. 

After a time the third stage sets in, due to the return of the pendu- 
lum toward the state of equilibrium. It is discovered that the truth lies 
between the over-enthusiasm of the first stage and the undue reaction 
of the second. The movement possesses some value for education 
but this value needs to be estimated by a careful study of its possibili- 
’ ties and its relationships to other educational processes. 

The procedure by which the true value of new educational processes 
is usually determined is unnecessarily wasteful. The third stage 
might be reached in a much more direct fashion if the critique which is 
applied in the second and third stages were introduced at the beginning, 
and if the unsystematic trial of the method in the class room were 
supplemented, and in a measure superceded, by the more systematic 
and organized testing of scientific experimentation. By this means 
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progress could be made more steadily and without the wasteful process 
of large scale adoption of unproved methods. 

We are in a particularly advantageous situation to pursue this more 
economical and scientific mode of examination because, in the first 
place, we are conscious of our educational history, and have before us 
many examples of the sort of reaction which has just been described. 
This consciousness should put us on our guard against the too rapid 
and uncritical adoption of new movements. On the other hand it 
should prepare us for the acceptance of progressive development and 
for the adoption of changes which have been thoroughly tested. The 
history of education indicates that education never stands still. The 
progress of invention and of social life outside the school demands 
that the school shall be adapted to these changes. History, then, 
both points out the necessity of the adoption of advanced procedure 
and warns against unsystematic and unscientific acceptance of every 
new proposal. In the second place, we have an advantage over 
previous generations in the possession of scientific technique of 
investigation. The rapid advance of laboratory experimentation and 
of statistical methods in the past generation gives us tools of research 
which have never before been available for testing out new move- 
ments in advance of their adoption in the school room. 

The principles which have just been discussed apply with particular 
force to visual education. The various methods which are comprised 
under this head undoubtedly constitute an advance in educational 
procedure. They possess possibilities which should by all means be 
realized in the school room. On the other hand, there are signs that 
the advantages which visual education possesses are being somewhat 
over-estimated and viewed in an uncritical and unpsychological 
fashion. The uncritical enthusiasm which is being developed is 
expressing itself in the undiscriminating propaganda which is character- 
istic of the first stage of a new movement. How far this propa- 
ganda will lead to wide scale adoption before the method is sufficiently 
tried out, is at the present time uncertain. It is to be hoped that a 
careful critique will at least hasten the third stage of careful and dis- 
criminating estimate, so that the second stage of reaction may be 
omitted. 

We may enforce the statement that undiscriminating propaganda 
is being made by afew examples. .The most common statement made 
by advocates of visual education runs something like this: “It is 
estimated by psychologists that 90 per cent of our sensory experience 
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comes through the eye. It is also commonly accepted that the higher 
mental processes, such as memory, imagination and reasoning, are 
founded upon sensation. It follows, therefore that education should 
appeal chiefly to the sense of sight. Visual education promises to 
revolutionize educational procedure and to supplant the customary 
modes of presentation.” 

This argument is very plausible and is calculated to convince many 
people. It is, however, contrary to fundamental and accepted psycho- 
logical principles. Psychologists do not concern themselves with 
estimates of the relative frequency of sensations of the different 
sense organs. I am reasonably familiar with a good many texts in 
psychology but I never met with a statement which is at all akin to 
the one which forms the foundation of the above mentioned argument. 
Sensation does not possess the immediate significance which is implied 
in this argument. There is, therefore, no point in trying to estimate 
the relative proportions which one type of sensation bears to the 
others. 

A more particular examination will indicate some of the false 
assumptions which are contained in the argument. In the first place 
it is not true that an experience which is initiated by vision is wholly 
visual in character. In fact, the total experience may be very largely 
non-visual. The sensation which is the starting point of the experi- 
ence may be a relatively minor part in the whole. 

The emphasis upon the sensation leaves out of account the impor- 
tance of its niterpretation. Sensation may mean vastly different 
things to different persons. We may illustrate this and other features 
of the analysis from the experience of witnessing a football game. Two 
persons watching a game may have the same sensory experience. 
The significance of this sensory experience, however, may be very 
little to one person and very great to another. The actions of the 
players may even seem ludicrous to a person who does not know their 
intent or purpose. The various signals which are used, the movements 
which are made in various stages in preparation of a play, possess a 
meaning only to a person who understands the game. 

We must include also under interpretation certain features which 
would commonly be erroneously ascribed to sensation. While two 
persons might be exposed, in the photographic sense, to the same 
stimulus or set of stimuli, one person would see vastly more of what 
was going on than another. The novice, for example, would observe 
only a confused mass of players, while the trained observer, would 
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notice which individual carried the ball, who formed the interference, 
what opposing player broke through and tackled the runner, and what 
tactics were followed by the rest of the players. All this would be the 
result of previous training and experience. 

The sensation of the moment is supplemented, furthermore, by 
many other sensations which may easily be overlooked. In the exam- 
ple which we are considering the auditory sensations would form an 
important part of the total experience. The music of the band, the 


' cheering and the signals by the players or the officials, as well as the 


undifferentiated sounds which emanate from the large crowd of people, 
form very important elements in one’s consciousness. Without these, 
many of the actions would lose their significance, and much of the 
feeling or emotional tone would be absent. It is a question whether a 
person would not get a richer experience from the totality of the other 
sensations than he would from the visual sensations without the others. 
The foregoing description has left out of account perhaps the most 
important group of sensory experiences. These are the result of the 
active or motor responses which are made in any real situation. The 
spectator at an athletic contest exhibits these motor responses to a 
marked degree. Many other factors in the total experience might, 
of course, be described. One’s interests and relationship to the teams 
themselves or to. the institutions which are represented by the teams 
has a determining factor in one’s total attitude. 

The illustration has been carried far enough to indicate that the 
sensation which may be thought of as initiating the experience consti- 
tutes but a small fraction of the total experience. The total experi- 
ence is made up of many other sensations and of attitudes, ideas, and 
feelings which are the product of much previous experience or training. 
The particular sense through which the present experience happens to 
originate may be of much less importance than it appears to be on the 
surface. 

The relative unimportance of the sense stimulus is clearest in the 
case of intellectual processes. The same intellectual activity may be 
initiated by a variety of sense experiences. The comparative indiffer- 
ence of the initiating sensation may be summed up. 

In the first place, a large portion of any experience is derived from 
other sensations than those of the chief sense which was stimulated. 
Many of these sensations may exist quite independently of the one 
which usually initiates them. In fact, imagination alone may serve 
to reproduce or to set up the greater part of the entire experience. 
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In the second place, the importance of the immediate sense is 
reduced by the fact that the greater part of the experience may be 
non-sensory in character. Intellectual activities, while they may be 
originally derived from simple sensory and motor processes, go far 
beyond their simple origins. A conclusive piece of evidence in support 
of this statement is the fact that certain individuals can carry on 
complete intellectual operations of a high order, who are entirely 
deprived of the senses which are usually considered the most important— 
sight and hearing. I refer, of course, to such persons as Helen Keller 
and Laura Bridgman. Even in these cases, some sensation is necessary 
as a starting point, but these cases demonstrate that the character of 
the sensation does not determine the character of the thought. 

Finally, it is possible to translate from one sense to another. It 
has been found very difficult to determine whether or not it is more 
advantageous to learn by the use of one sense or another, because it is 
almost impossible to determine which sense is actually used by the 
individual learners. It is a psychological commonplace that the sense 
through which the presentation is made is not necessarily the one in 
which the person thinks. To take another example, every novel reader 
conjures up in his mind images of persons and scenes which are nearly 
as vivid and often more satisfactory than the pictures which might be 
presented to his visual sense. 

The burden of the foregoing discussion is that it is a very hazardous 
procedure to argue regarding the character of the total experience 
from the character of the sensation which appears on the surface to be 
the chief element of experience. The particular sense through which 
experiences in general are initiated is not of paramount importance. 
It is, of course, true that certain senses may possess advantages in 
particular cases, and it is furthermore unquestioned that certain special 
experiences can only be initiated by particular sensations. For 
example, music is dependent upon hearing, and the appreciation of 
painting is dependent upon sight. These, however, are special cases, 
and no wholesale argument can be based upon them. It is necessary 
rather that each case be examined for itself, in order that it may be 
determined what the most advantageous type of sensory stimulus may 
be. The thesis of this paper is that the various problems of presenta- 
tion must be treated as a series of special cases and that each must be 
decided on its merits. 

The foregoing discussion furnishes a criticism of certain psychological 
arguments which have been presented in support of visual education. 
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That this criticism is justified is indicated by the results of the careful 
examination of visual methods in a recent experiment. This experi- 
ment has been carried on by F. D. McClusky, who will shortly report 
the detailed results himself. I may, however, anticipate his report by 
citing some of the very general conclusions which are to be drawn from 
it. The investigation included over 700 children. It consisted in the 
comparison of the results of different forms of presentation of lessons 
in history, geography, and natural science. In each case the compari- 
sons were made with great care and with an observation of the various 
checks necessary to secure valid results. Comparison was made 
between different modes of visual presentation, such as motion pictures 
and slides, with combinations of oral and visual presentation and with 
oral presentation alone. 

The results of this study indicate that there is no justification for 
the adoption of the visual methods in exchange for those which are at 
present in use, on the basis of any wholesale conception of the superi- 
ority of vision. In fact, if the examples which were studied are to be 
taken as the sole basis of an estimate, one would have to conclude that 
visual methods possess little, if any superiority, and that the newer 
motion picture methods have no advantage over the older visual 
methods. The chief reason for not accepting the results of study at 
their full value and adopting this conclusion, is that these newer 
methods probably possess potential values which have not yet been 
fully realized. In order to determine more exactly what these poten- 
tial values are, we need still broader investigation. Present investiga- 
tion, however, is entirely adequate to constitute a complete refutation 
of any sweeping claims for visual education on the ground of general 
supremacy of visual sensations. 

It is obvious that the problems of visual education are not solved 
at the present time. We should not expect them to be solved if we 
adopted a rational attitude toward the matter. The limitations of 
the visual method, which appear as a result of these experiments, 
would appear sooner or later as a result of their general use in the class 
room. It is, therefore, in the interests of the progress of visual educa- 
tion that its limitations be pointed out early. Experimentation is 
desirable also to indicate the direction in which visual methods should 
be developed, in order that their greatest possibilities may be realized. 

It is possible by a more careful psychological analysis to determine 
something of the special uses and advantages of visual education, and 
something of its limitations in advance of experimentation. This 
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analysis, of course, must be regarded as somewhat tentative, but it 
has the merit of considering the problems in specific fashion rather 
than in the wholesale fashion in which they are often viewed. One of 
the limitations of the method grows out of the fact that only a certain 
type of meaning can be conveyed by objects which are presented to 
the eye. I refer, of course, to concrete objects and not to visual 
symbols, such as the printed word. The meanings conveyed by 
concrete objects or their pictorial representation must be of a rather 
concrete, simple kind. Such representation is not suited to convey 
the more subtle abstract or general meanings. These meanings are 
conveyed by language. 

We are now in a period in which language is viewed with suspicion 
and disfavor. One may convince himself, however, of the necessity 
of language by observing its use in connection with motion pictures. 
Motion pictures themselves give the raw material of the experience, 
but the significance of this material is furnished by the captions or the 
reading passages. Let one refrain from reading these captions and he 
will be convinced of the comparatively large share of the meaning 
which is conveyed by them. It is true that certain crude type of 
meaning can be conveyed visually, such, for example, as physicial 
combat. This is probably the reason that fighting is so common an 
occurrence in the ordinary motion picture production. Consider, as 
another example, the representation of humor in motion pictures. 
Here again we find that a certain type of humor can be conveyed 
visually. This is made familiar by the “‘slap-stick, custard pie”’ style 
of humor. Aside from this, however, the laugh is nearly always 
elicited not by the picture itself, but by the caption. The limitation 
of visual presentation, in the type of meaning which it can convey must 
be kept in mind in estimating the usefulness of visual presentation for 
education. 

A second limitation is that visual presentation in general, and 
especially motion pictures, dispenses largely with the personal influ- 
ence of the teacher and with the social inter-action of the members of 
the group. This is noteworthy at a time when it is thought desirable 
to extend rather than reduce the teacher’s influence in supervising the 
pupil’s learning processes. This is an aspect of the matter which 
should be carefully considered. The teacher before the class can hold 
the attention of the pupils by eye, voice, and personal presence, and 
can determine, by watching the children, whether they are following 
the discussion, and thus adapt the pace to their own progress. Itisa 
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common assertion that motion pictures hold the attention of pupils 
more strongly than do other forms of class exercise. McClusky’s 
study indicates that this is to be seriously questioned. This is a 
matter which needs further experimentation, and on which it is neces- 
sary to be cautious in accepting the conclusions from the entertain- 
ment movies. 

Over against these limitations are to be set certain probable advant- 
ages. It is undoubtedly true that pictures and visual stimuli gen- 
erally possess a certain immediate appeal. This is an appeal which 
visual material shares with other sensory stimuli. It is an advantage 
of visual stimuli particularly because of the large amount of material 
which is susceptible to this mode of presentation. If we could pre- 
sent the same material directly to the other senses, we might find the 
sensory appeal to be as strong as in the case of vision. When we 
estimate this appeal of visual material, however, we usually compare 
it with presentation through language, either in print or oral speech. 
While it is true, as has already been argued, that presentation through 
language is essential to give meanings of the more general or abstract 
sort, it is also true that for most persons, such presentation through 
language has somewhat less direct appeal than have sensory experi- 
ences. For the presentation of meanings which can be conveyed 
through sensory channels, therefore, it is desirable that concrete 
materials be employed. 

In the next place, certain types of relationships may be most clearly 
apprehended when they are represented visually. Visual devices are 
particularly suitable for the representation of special relations. One 
may gain a much clearer notion of a geographic region from examina- 
tion of a map than by any other means. The construction and opera- 
tion of mechanical contrivances, furthermore, are better shown than 
described. It goes without saying that the graph is an unrivalled 
method of presenting certain types of relationships between facts, 
certain general comparisons and general trends and changes. Other 
types of changes may be represented peculiarly well by motion 
pictures. Noteworthy examples are the analysis of a rapid motion 
by a picture which is slowed down in the projection, and the repre- 
sentation of very slow movements by rendering them perceptible 
through speeding up the projection. All of these advantages are 
unquestioned and important. They indicate the direction which it 
would probably be most profitable for the development of visual 
methods to take. 
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Motion pictures, slides, models, and other visual materials share 
with text books the advantage of being the means of diffusing expert 
examples of presentation. H.G. Wells has emphasized this advan- 
tage in an exaggerated fashion in his famous articles on education. 
Pictorial representation is also of advantage in making widely available 
the working of rare or expensive apparatus. Similarly the perform- 
ance of a difficult act by an expert may be analyzed and presented 
broadcast. 

Experimental research should be devoted to the discovery of the 
types of educational material which are best adapted to visual pre- 
sentation. An analysis, such as the foregoing, can simply point out 
the most probable lines of development on the basis of our general 
psychological insight. Such analysis needs, however, to be supple- 
mented and verified by careful scientific procedure. 

Experimentation should be applied also to the study of certain 
problems in the development of the visual method itself. The prob- 
lems which are here mentioned relate particularly to motion pictures. 
One of these problems concerns the span of attention. The traditional 
material is organized so as to conform to the span of attention of pupils 
for whom it isintended. This is true of text book material and of oral 
lessons. This organization has been developed empirically through 
long periods of use in class room. It is possible to work it out more 
quickly and systematically by scientific experimentation. 

Another problem which should be attacked is the best method of 
securing the attention of the pupils. The attractiveness of emotional 
films is sometimes exaggerated by pressing the analogy of films which 
are designed for entertainment. It is coming to be recognized, how- 
ever, that educational films must rely upon different sources of interest. 
We cannot rely simply upon the sensory appeal which has already 
been mentioned. The primary problem is so to organize the film from 
the standpoint of intellectual apprehension that it may furnish, in 
addition to the sensory appeal, both intellectual stimulation and 
satisfaction. In other words, the presentation must be adapted to 
the intellectual capacities, interests, and activities of the pupils. A 
subordinate problem concerns the methods by which the attention of 
the class may be kept upon those features of the presentation which 
are central. 

Related to this problem of attention is the question of the rapidity 
with which the units of thought or of subject matter are presented and 
the correlative question of amount of detail which should be included. 
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A given topic may be presented rapidly by stressing only the outstand- 
ing features, or it may be presented slowly with the addition of many 
details. It is sometimes mistakenly supposed that the omission of 
details simplifies the presentation. Finally, it is necessary to deter- 
mine how much repetition and review is necessary in order to secure 
permanence of learning. All these questions are susceptible of experi- 
mental investigation. 

In the interests of visual education, then, experimental investiga- 
tion should be made to determine the type of educational subject 
matter to which it is best adapted, and the manner in which it may 
best be organized. Such a study will form the basis of steady and 
permanent progress. Unsound propaganda, on the other hand, will 
lead to more rapid initial progress, but this will be followed by a reac- 
tion which will result in slower progress in the end. 
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A FURTHER CRITERION FOR THE SELECTION OF | 
MENTAL TEST ELEMENTS HH 


J. CROSBY CHAPMAN 
AND 
A. BARBARA DALE 


Department of Education, Yale University 


In a short note! by one of the authors, attention was called to the 
necessity of a further criterion for the selection of mental test elements. 
Excluding considerations of inter-correlations, the two criteria by 
which mental test elements are most commonly judged are: 

1. Increase of performance from age to age. 

2. Coherence. 





Neither of these criteria is very serviceable in investigating the extent a 


to which performance in a test is conditioned by hereditary brightness | 
or by mental changes produced largely by exposure to training influ- +i 
ences. Obviously, the first criterion, increase of ability with age, 
tends to operate in the same direction as the training factor, while 
the coherence criterion is so crude as to offer no definite safeguard 
against this disturbing factor of environmental training. To quote 


from the note to which reference has already been made: ‘‘In thus 1 4: 


establishing the validity of the test elements, we have been guilty 


of loading the dice in our own favor; we have made our task too easy. aa 


It is essential that we set up an additional criterion which will load it 


the dice in the opposite direction. Whereas in the above situation, ae 
} 


both factors, age and training, work in the same direction, we must 
set up a criterion in which they work in opposite directions.” 
The effect of environmental training as a complicating factor in 
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intelligence testing is, of course, no new discovery; Binet, Chotzen, Bee 


Stern, Terman, Pintner and Paterson, and many others have called fe 4) 


attention to its influence. But no one has pressed the matter to its 
logical conclusion. What is required is a method of discrimination, 
other than by personal judgment, of the relative weight to be placed 
on the separate factors of intellect and training in the different elements 
of atest. It is generally agreed that intelligence must be measured in 
terms of the higher complex processes. It follows, therefore, that we 
must so arrange our experimental conditions that the higher processes 


1 Chapman, J. Crosby: An Additional Criterion for the Selection of the Ele- | : 


ments of Mental Tests. Journal Educational Psychology, April, 1921, pp. 232-235. 
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can reveal their presence in the chronologically young but mentally 
bright child. The test elements which measure the presence or 
absence of these powers must be freed from non-essential experience 
factors. Otherwise we shall never get evidence of the presence of 
these processes in the young bright, not because they do not exist, but 
because, to show themselves, it is required that they be exercised 
with a facility not yet acquired by, or with material not yet imparted 
to, younger children. For example, the performance of the young, 
precocious child, even in a test such as the opposites, may be prejudiced, 
not necessarily by lack of speed of conceptual thought, but by inade- 
quate vocabulary or possibly by the absence of facility in reading, due 
to lack of practice. The obvious fact that the bright child derives 
much more benefit than the dull child from precisely the same environ- 
ment, or from much less environmental influence, does not by any 
means make the effect of environment and training negligible. The 
temptation is always present to estimate intelligence by mental tests 
so constructed as regards facility demanded and information required 
as to result in a measurement of factors which are exceedingly closely 
correlated with grade position. Whenever this is done, knowingly 
or unknowingly, we are, really, abandoning the mental test as the cri- 
terion and putting our trust in the high correlation which we know exists 
between grade and intelligence. On the surface we are relying on a 
short performance in the selected mental test elements but in reality 
we are putting our confidence in the long continued process of school 
selection to raise our correlation. This procedure, while unquestion- 
ably yielding, in an easy manner, fairly high correlations, does great 
injustice to individuals whose environmental opportunities are not 
essentially normal. Unfortunately, the present statistical procedure, 
correlation formule and mass treatment of data tend to burke the issue. 

To return to the method of discriminating between native and 
environmental factors, let us suppose that a large number of children 
are given the same complete intelligence test, composed of various 
elements, some of which, typified by element A, call for a greater degree 
of native ability; while others, typified by element B, can be more 
satisfactorily performed by virtue of longer training and exposure to 
environmental stimulus. Suppose, moreover, that from amongst 
these children two groups, of differing chronological ages, can be so 
selected that each member of the one group is matched in total score 
by a corresponding member in the other group. Let us suppose that 
the first group, designated group O, consists of children over 13 years 
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of age, drawn from Grades VII and VIII while the younger group, 
group Y, is drawn from Grades III and IV, the age in every case being 
less than 10 years. With the data on the various tests from two such 
groups, it is possible to investigate the problem of chronological matur- 
ity and environmental influence: Group O will make up its total score 
more by good performance in Test B than in Test A; while the reverse 
will hold for Group Y. Hence in Test A the average score of the 
younger children will exceed that of the older; in Test B, compensatory 
marks will be gained by the older children. By these means it will 
be possible to rank a series of tests according to the superiority shown 
by the young bright pupils as compared with the old, dull pupils. 
This, then, will be their order of merit as tests for native intellectual 
endowment; the reverse order will rank the tests according to the degree 
in which they are dependent on the training factor. The fundamental 
assumption underlying this argument will be examined later in the 
paper. 

The authors had at their disposal about 5000 National Intelligence 
Test Blanks (Series A) which had resulted from the administration of 
this test to Grades III to VIII in several elementary schools in Mount 
Vernon, N. Y. This material was furnished by W. H. Holmes, the 
Superintendent of Schools, to whom the authors wish to express their 
thanks. From this material were selected two groups which fulfilled 
the following conditions: 

1. The parents of all children must be of American or British birth 
(7.e., English speaking). 

2. The children of the first group (Young Bright, Y. B.) must be 
less than ten years of age, drawn chiefly from Grades III and IV. 

3. The children of the second group (Old Dull, O. D.) must be 
thirteen years of age or over and drawn from Grades VII and VIII. 

4. The scores obtained by a member of either group must fall 
between 70 and 119. Otherwise, for the child of 9, the extreme low 
score does not represent brightness, nor, for the pupil of 13 years, does 
the high score represent dullness. 

From these two groups a narrower selection was made by pairing, 
in the Old Dull group, each paper which could be matched in total 
score with a paper from the Young Bright group; if necessary a differ- 
ence of one mark, and no more, was allowed between the totals of any 
one pair. In this way fifty pairs of papers, each pair evenly matched, 
were obtained. That is to say, we have two groups of identical con- 
tent as regards total scores, the one composed of the lowest scoring 
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children of 13 years of age and over, and the other, of the highest 


scoring children of under 10 years. 


The methods of scoring and cal- 


culating totals were those dictated by the committee constructing the 


test. 


Interpreting the test strictly, the total scores being similar, this 


selection furnishes two groups of the same mental age, but of widely 


differing chronological age. 


Without strictly defining our terms we 


may, with reasonable accuracy, speak of an Old Dull group and of a 
Young Bright group, the average age of the first being 14 years 7 


months, and the average age of the second, 9 years 3 months. 


There 


is, therefore, for each paired paper an average difference in chronologi- 
cal age of about 5 years, with a range from 3 to 8 years. 
Each group was then separated on the basis of total scores into the 


intervals 70-79, 80-89, etc. 


The average score for the subjects within 


each of these intervals, for each test, and for totals, was determined 


for both groups. 


From these data, weighted according to the number 


of cases in each interval, averages were obtained for each group as a 


whole, in each of the tests. 


These results are presented in Table I, 


where Tests 1, 2, 3, 4, 5, are arithmetical reasoning, sentence comple- 


tion, logical sequence, opposites and symbol-digit substitution test, 
respectively. 


TaBLE I.—SHOWING THE AVERAGE SCORES AND DISTRIBUTION OF THE Two 
MatcHep Groups or 50 Cases IN SuccessivE INTERVALS OF 10 PornTs, 


THE FINAL AVERAGES AND Ratios OF THESE IN Eacu TEST 





Young Bright average scores 


Old Dull average scores 











a Fre- 
Pie Interval 
quency 
Test | Test | Test | Test | Test | Total | Test | Test | Test | Test | Test | Total 
I II III | IV Vv I II III } IV V 
70— 79 9 10.0) 19.8} 17.0) 13.7) 12.8) 73.2 | 12.9) 17.6) 21.3) 11.3] 10.3) 73.4 
80— 89 9 9.8) 19.6) 20.1) 19.4) 15.9) 84.8 | 12.1) 22.3) 23.3) 9.4{ 17.3) 84.6 
90— 99 14 14.3) 23.9) 22.6) 19.7) 15.6) 95.9 | 14.4] 19.5) 22.2) 15.3] 24.4) 95.8 
100-109 11 13.8) 25.2) 23.0) 20.3) 21.7|104.0 | 16.0) 27.3) 25.0) 12.5) 23.4,104.1 
110-119 7 15.4| 26.3) 27.0) 21.9) 22.7/113.3 | 16.3) 29.4] 28.3) 13.4) 26.11113.6 
Final 
average. . 50 12.7) 23.2) 21.8) 19.0) 17.5) 94.1 | 14.3} 22.8) 23.7) 12.6} 20.6) 94.1 
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atio of average scores 0.D.""": 


Test I Test II 


Test III Test IV Test V 





0.87 1.02 











0.92 1.51 0.85 
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At the bottom of the table the ratio of the performance of the 
Young Bright to the Old Dull is shown. Where the ratio is greater 
than unity there is a tendency for the test, as scored in the present 
procedure, to favor native intelligence; where the ratio is less than 
unity, the emphasis is rather on the training factor. It will be seen 
that Table I shows that in Test 4 (opposites), there is a very large 
ratio in favor of the Young Bright group. Test 2, thesentence comple- 
tion, occupies an intermediate position, while Tests 3, 1, and 5 show 
decreasing requirements of brightness, and if interpreted strictly, 
favor training and chronological maturity. While there is a very 
clear discrimination between the opposites test on the one hand, and 
the substitution test on the other, it is only by a more elaborate exami- 
nation of the data that the reliability of these results can be 
investigated. 

To make this more complete investigation, the distribution in 
each test for each of the groups was made. For each of these distribu- 
tions and for the totals, the average, median, and mean square devia- 
tion was determined. These are recorded in Table II and the data 
worked over in Table III. 


TaBLeE II.—SHOWING THE CENTRAL TENDENCIES AND VARIABILITY OF SCORES 
AND AGES FOR THE Two MatcHep Groups oF 50, ToGETHER WITH 
RELIABILITY DaTa 
Young Bright 
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Age, 

WG avi cies wadesceewaeets I II. | Ill IV V_ | Total | Years, 

Mos. 

I in nice ncaaeesewan 13.72) 23.76} 22.32) 19.76) 18.16) 94.7 | 9.4 

ec aas < Jahan cana eee 3.55) 4.39) 4.58) 4.01) 5.48) 13.4] 0.6 
Ce Ry a ee PA 0.50} 0.62) 0.65) 0.57| 0.77| 1.9 

a oe se all 13.64) 23.60) 22.22) 20.00) 17.54) 96.5 | 9.5 
Me iededscetsnawatee hea nia 0.62; 0.77) 0.81; 0.71) 0.96) 2.4 

Old Dull 

BE binvesaneneaes hens 15.32) 23.76) 24.20) 13.28) 21.28) 94.6 | 14.7 

is cutis dae ccnneseeei eo 3.45) 6.48) 6.00} 6.29) 8.82) 13.5) 1.1 
Bi is hus ain: «ark acct erase 0.49} 0.92) 0.85) 0.89) 1.25) 1.91 

<ccteewedeadaled eae 15.85] 23.43) 25.33) 13.54] 23.33) 96.5 | 14.7 
ls ceun erent estanen tenn 0.61; 1.15) 1.06) 1.11) 1.56; 2.4 
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TasLeE II].—SHOWING THE DIFFERENCES BETWEEN THE CENTRAL TENDENCIES 

Atso Ratio 
oF AVERAGES AND MEDIANS FOR THE Two Groups 

Young and Bright Score minus the Old Dull Score in first four rows 


IN THE Two MaTCHED GROUPS AND THE RELIABILITY DATA. 











DA stueithiasstekeenebeas I II III IV V 
: = i" 
0 EE OPE TT Ee Ee | —1.60 |} 0.00 —1.88 | 6.48 —3.12 
A ea RE ee en ea eee 0.70; 1.11 1.07 | 1.06 1.46 
SES er —2.21 0.17 —3.11 6.46 —5.79 
Weds Unie tas séh es bed canteens ee 0.87 1.40 1.33 1.32 1.83 
. Y.B. 
Ratio of averages OD 0.89 1.00 0.92 1.49 0.85 
Ratio of medians 4"7> sieabss 0.86 | 1.05 0.88 | 1.48 0.75 




















TaBLeE [V.—SHOWING THE CENTRAL TENDENCIES AND VARIABILITY OF THE 41 
MatcHep Parrs (ZERO Scores ELIMINATED). 

RELIABILITY 

Young Bright 


Atso A MEASURE OF 






































Age, 
NE ie ie ahd hed ot dindicawies I II III IV V_ | Total | Years, 
Mos. 
Average...... 14.17; 24.44) 21.98) 20.17) 18.59) 97.50) 9.5 
“Ths REPEC CT Teer eer Te 3.61) 3.94 4.12) 4.08) 5.71) 11.06) 0.5 
TE TO ee ee ee 0.56; 0.61! 0.64) 0.63! O.89| 5.46 
Re ih gon 14.43) 24.15) 22.71| 20.55) 18.00) 98.25) 9.6 
i oe Ca kde ewe eae sual 0.70; 0.77; 0O.80| 0.79; 1.11) 6.82 
Old Dull 
RE eee ees 15.54) 24.05) 23.76) 13.9 | 23.55) 97.38] 14.8 
RT aa art oe ence kas Kt he 2.96} 6.25 6.17) 3.40) 3.79) 11.19) 1.0 
Raia Nhs tae a uae int ete aia 0.46; 0.98 0.96) 0.53) 0.59) 5.52 
oo ee en ene arden 15.90) 23.54 24.83) 13.64) 24.17) 98.25) 14.7 
a a Ete aati at el 0.67) 1.22) 1.20) 0.66) 0.74) 6.90 
Ratio averages OD. Pew bl dae 0.91} 1.02; 0.93) 1.45) 0.79 
; 1 
Ratio median ODT 0.91; 1.03; 0.91} 1.51) 0.74 





























Selection of Mental Test Elements 273 


To meet the criticism that the results obtained for the median and 
the average scores of the two groups were affected by the presence of a 
few zero scores, this point was given further investigation. While no 
zero scores occurred in the younger group, it was discovered that there 
were four zero scores amongst the older group in Test 4, and six in 
Test 5. 

To leave no doubt in the matter, those papers containing zero 
scores were eliminated in the older group, and the corresponding 
paired papers were taken from the younger group. This reduced the 
number of pairs to 41 and the process of calculating the central tenden- 
cies, etc. was repeated for these pairs. The results are combined in 
Table IV which is too similar to previous tables to need further 
comment. 

As confirmatory evidence of the reliability of the above results, it 
may be well to add that, for three other pairs of 20 apiece, selected 
without eliminating the language question, the same general figures are 


obtained. The final averages for these pairs furnish the following 


ratios: 


Test 
I II Ill IV V 


. aan 
Ratio 6p. Suubedtne aes on 0.92 1.04 0.97 1.44 0.78 


Before any conclusions are drawn, one point should be made clear. 
Where the ratios expressing the performance of the Young Bright to 
the Old Dull differ greatly from unity it is safe to draw deductions. 
Where, however, the differences are small, nothing can be inferred. 
Suppose, for example, in a battery of four tests, one test was favorable 
to native intelligence, the other three being neither favorable to the 
young nor to the old group. In the first mentioned test the extra 
score of the young would have to be compensated by slightly lower 
scores on the three other tests, if the totals of the two groups were 
the same. We cannot, therefore, make any absolute quantitative 
estimates but must confine ourselves to stating, under the present 
system of scoring, the order of merit of the tests as measures of native 
intelligence, rather than of environmental training. These rank 
as follows: 

(1) Opposites Test, (2) Sentence Completion, (3) Logical Selec- 
tion, (4) Arithmetical Problems, (5) Symbol-digit Substitution. 

It is now time to examine more closely the fundamental assumption 
upon which the procedure of this study is based. This may be stated 
as follows: A test element in which the performance of the Young 
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Bright exceeds that of the Old Dull is, except in unusual circum- 
stances,! zpso facto, a superior test of native intelligence. In those 
tests where the Old Dull are superior to the Young Bright, this 
superiority can be explained in two ways. ‘The first and most obvious 
explanation would be found in the fact that the Old Dull have been 
exposed for a longer period to training influences. The second 
explanation, which is less probable, would advance the hypothesis 
that the superiority of attainment of the Old Dull was caused not 
necessarily by longer exposure to environmental influence but, rather, 
by the maturation of innate mental powers due to greater chronologi- 
cal age. While it is impossible to show that the second explanation 
is erroneous, the following facts may be adduced as cumulative evi- 
dence against it. The two tests which exhibit the superiority of the 
Old Dull are the substitution and arithmetical reasoning tests. The 
former is somewhat dependent on acquired eye and hand co-ordination, 
while the latter is obviously much subject to school training. Further- 
more, the test in which the young show their maximum superiority is 
the opposites, involving a somewhat high form of conceptual thinking, 
which is not subject to direct practice in the ordinary procedure of the 
school. It is also difficult to explain why the inner development due 
to the chronological age of the duller pupils should result in increased 
powers in one direction, with no similar increase in other directions. 
Certainly, the burden of proof rests with those who maintain the 
second position. The first explanation is much the simpler, fits in 
much better with the observed facts, and is supported by the theory 
of the general development of intellectual power. 

In the light of these deductions, the acceptance of the total score 
as the criterion of equality of intelligence is subject to criticism. It 
may be said that these results are subject to the fallacy that our test 
of total intelligence is derived from the data, the validity of which we 
are examining. To meet this objection, it may be urged that as we 
are compelled to have some measure of intelligence, the total is 
probably the most reliable. Certainly as the test is usually 
employed the verdict depends on the total in all the tests. 

If, as we are bound to assume when we employ a group test, the 
same total represents the same mental age, whatever may be the 
chronological age, this study shows that the mentality of the Young 
Bright is different from that found in the Old Dull. It also shows that 


1 For example where the Young Bright had recently practiced a function which 
the Old Dull had forgotten. 
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intellectual development is not marked by a simultaneous and uniform 
improvement in all types of mental function. The Old Dull subject 
of mental age z is the superior, equal, or inferior of the Young Bright 
of mental age z, mental age being measured by totals, according to 
the type of test used. This is surely a peculiar state of affairs. 

It would seem therefore as though a very difficult problem must be 
faced by those who construct intelligence tests. If we select elements 
which are analogous to the opposites test, we shall thereby give 
advantage to a power which is found in the Young Bright, but to a less 
extent in the Old Dull of the same mental age, as measured by a group 
test. If, on the other hand, we use such tests as substitutions, we shall 
under-estimate the intelligence of the younger child and probably over- 
estimate that of the older. This gives rise to the anomalous situation 
that the score of a pupil, whether bright or dull, can be raised or low- 
ered according to the elements which are selected for the test. 

If the Young Bright and the Old Dull are scoring on different tests, 
this may have a significant effect upon the inter-correlations of tests; 
that is, upon the logic of partial correlation method, which partly 
determines the selection of tests. Tests which are free from environ- 
mental training influence may be eliminated because of high inter- 
correlations, in favor of tests showing lower inter-correlations produced 
by the environmental factor. If this is the case, upon what adequate 
psychological and sociological criterion shall selection be based? 

A further point of debate raised by this study is the advisability, 
or even the possibility, of making comparisons of subjects of one 
chronological age with subjects of another chronological age. In the 
light of the results which have been obtained, is it fair to compare the 
performances of an 8-year-old with that of a 10- or 12-year-old? 
The use of mental age and the corresponding IQ coefficient assume the 
legitimacy of the procedure. In individual examinations, we give to 
precocious children of 9 years of age the tests which have only been 
devised and justified for children of, let us say, 12 years of age. For 
example, both criteria used by Terman only establish the fact that the 
test element is valid for children of approximately that chronological 
age in the neighborhood of its particular age position. Such procedure 
establishes no right to use this test, forthwith, on younger age groups 
that have not had the same period of environmental influence. 
Environmental influence is no theoretical factor which can be elimi- 
nated by shifting the tests from age group to age group, in the attempt 
to get a normal distribution of intelligence at each age. 
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It would appear that the use of the mental age method of measure- 
ment, while practically straight-forward, is subject to such inherent 
defects that for finer work in the realm of intelligence measurement 
it must eventually be displaced. A child of 8 and a child of 12 cannot 
be compared. It is an impossibility to select test elements which are 
not affected by the additional 4 years of environmental influence 
enjoyed by the latter child. Eventually we must state the perform- 
ance of the z-year-old in terms of the performance of a large group of 
x-year-old children, using either percentiles, or distances in terms of 
sigma. Even with this precaution, the differences in environmental 
opportunity of the x-year-olds will make it difficult enough to select 
fair test elements. Under the present scheme such a selection is 
probably impossible to attain. 


SUMMARY 


1. The question of the effect of environmental influence and train- 
ing on performance in mental tests is again raised. 

2. A new criterion is proposed which will rank tests with reference 
to the weight placed on hereditary brightness rather than on environ- 
mental training. 

3. To apply this criterion, two groups, on the basis of totals in the 
National Intelligence Test, Series A, are selected, one consisting of the 
Young Bright and the other the Old Dull. The members of one 
group are paired, as far as totals in the test are concerned, with mem- 
bers in the other group. 

4. It is shown that these two groups are identical in totals, score, 
in differing amounts, in the five tests constituting the examination. 

5. The Opposites Test seems to depend, to a high degree, on native 
intelligence, while the Arithmetical Problems and Substitution depend 
more upon the environmental factor. 

6. The significance of the above results and the effect on the selec- 
tion of mental test elements is discussed. 

7. The legitimacy of the present method of estimating intelligence 
by the IQ method is considered. 








THE CORRELATIONS OF ACHIEVEMENT IN SCHOOL 
SUBJECTS WITH INTELLIGENCE TESTS AND 
OTHER VARIABLES (CONCLUDED) 


ARTHUR I. GATES 
Teachers College, Columbia University 
5. The Intercorrelations of School Subjects—Table VIII gives the 


mean intercorrelations of school subjects, and the correlations with 
MA and CA. 


TaBLE VIII.—INTERCORRELATIONS OF ScHoot SusJsects. FIGURES ARE THE 
MEANS OF THE C@EFFICIENTS FOR GRADES IV To VIII INCLUSIVE 
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Reading Comprehension......|...... 0.85, 0.24) 0.45) 0.06 | 0.49 |—0.25 
Reading Rate.............. 0.85)...... | 0.22! 0.38) 0.21 | 0.46 |—0.35 
Arithmetic................. 0.24; 0.22/...... 0.24) 0.09 | 0.30 |—0.19 
onc ws hey yea 0.45] 0.38} 0.24)...... 0.02 | 0.31 |—0.23 
re 0.06} 0.21} 0.09} 0.02 0.08 | 0.05 
Mental Age.................| 0.49} 0.46} 0.30) 0.31) 0.08 
Chronological Age........... —0.25 —0.35) —0.19| —0.23) 0.05 

















Generally, the correlations of one subject with others are not high. 
Writing shows no association of significance with any variable here 
listed. Arithmetic correlates but slightly with other subjects. 
Spelling is more closely associated with reading than with other sub- 
ject. Our criteria of General Achievement (a composite of Reading 
Comprehension, Reading Rate, Arithmetic and Spelling) are therefore 
seriously in need of study and evaluation. The measures of the partic- 
ular subjects were extensive enough to warrant confidence in their relia- 
bility, although they may fall considerably short of perfect validity. 

The two most likely explanations of the low intercorrelations 
among the school subjects are: (1) A specialization among the functions 
due primarily to original (inherited) aptitudes and (2) differences in the 
degree of possible achievement in each, depending largely upon the 
relative emphasis in teaching. 
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For both views there is evidence; a greater abundance for the 
former. The importance of the latter possibility is suggested by the 
studies of Hollingworth! and more recently by those of G. S. Gates,? 
which show an increase in the intercorrelation of even very narrow 
functions (tapping, color naming, etc.) as the subjects approach a 
practice limit. If we could secure a random selection of 12 year olds 
who had pushed achievement in reading, spelling and arithmetic to 
the very limit, the intercorrelations and the correlations with intelli- 
gence tests would probably be much higher than those now found. 
Whether arithmetic will yield high correlations with intelligence 
(assuming that we had a valid measure of general intelligence) in School 
X, Y or Z, probably depends considerably on what the school does in 
its teaching of that subject. Our data represent merely the facts for 
one school, but our impression is that it is rather unlikely that the 
correlations would rise near + 1.00 even if each subject were developed 
to the limit. There is doubtless some specialization due to native 
endowment. 

If this is true, two suggestions follow. The first is a question 
concerning the validity of the “‘accomplishment quotient,’ which is 
based on the assumption of no (or, at least, slight) specialization. 
The other suggestion is that we should attack the problems of dis- 
covering tests of native ability for each subject separately. We should 
have tests for native aptitude for arithmetic, writing, drawing, spelling, 
and so on. 

6. Group Tests and Stanford MA Correlations with Particular Sub- 
jects—In Part III, Section I, it was found that the more verbal the 
material, the higher the correlation with all subjects except arithmetic 
which was more closely associated with moderately verbal material. 
This is suggestive of a starting point in future research for tests of 
aptitude for the different subjects. The Stanford test and many 
group tests include materials which rank high, low, and at various 
levels on the verbal scale. If there is a specialization of native abili- 
ties, the result is a rather moderate correlation with a composite of all 
school subjects. Table IX gives a comparison of MA and Group 
Test correlations with the particular subjects, the figures representing 
the mean for Grades IV and VI. 


1 Correlations of Abilities as Affected by Practice. Journal of Educational 
Psychology, 1913, p. 405. 
2 Doctor’s thesis (unpublished) in the Library of Columbia University. 
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TABLE IX.—CoORRELATIONS OF THE MEAN Group TEST AND STANFORD MENTAL 
AGE WITH ABILITY IN ScHooL Sussects. MEAN oF RESULTS FOR 
Graves IV to VI 








. Com- 
Reading | Readi Arith- |  Spell- i 
Compre- | ““Hate’ | ‘metic | “Ing” | Achieve. 

Mean Group Test!............ 0.59 0.49 0.27 0.35 0.52 

Stanford Mental Age.......... 0.49 0.46 0.30 0.31 0.54 




















1 Data for Myers Test omitted. 


The mean group test stands a little higher than the Stanford on 
the verbal scale, according to judgments. On the scale 1.0~-7.0, 
the Stanford is rated 4.8, the mean of the Group Tests (Myers omitted) 
is about 5.5. The amount of working time for the Binet is slightly 
greater than for the average group test, but as found in Part III, 
section 2, this difference would not have a great influence. Table 
VIII shows that the group test yields slightly higher correlations with 
reading and spelling, and a slightly lower correlation with arithmetic. 
What we get, then, is a moderate correlation with all subjects but a 
perfection prediction of none. 

7. Grade Differences in Correlations.—Table X is computed from 
the appropriate columns of Table IV. It gives the correlations when 


the coefficients for all group tests (except the Myers) for all grades are 
averaged. 


TABLE X.—SHOWING THE MEAN CORRELATIONS OF GrRoUP TESTS WITH 











Stanford | Reading Reading Arith- Composite 
Grade Mental | Compre- Spelling | Achieve- 
: Rate metic 

Age hension ment 
IV 0.38 0.59 0.43 0.30 0.47 0.54 
V 0.52 0.63 0.45 0.22 0.29 0.49 
VI 0.60 0.61 | 0.53 0.32 0.35 0.57 
VII 0.58 0.56 0.25 0.33 0.52 
VIII 0.50 0.43 0.25 0.33 0.47 























There is a rise with the grade in the correlations of group tests with 
Stanford MA but the correlations of Group Tests with school subjects 
are about as high in the lower as in the upper grades. 
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If the correlations of Group Tests and the criteria of Achievement 
are about the same from grade to grade while the Group Tests yield a 
higher correlation with MA as the grade becomes higher, it will follow 
that the correlations of MA and Achievement will similarly go up. 
The data to Table XI show this to be the case. 


TABLE XI.—CoRRELATIONS OF STANFORD MENTAL AGE WITH: 








| | 
° ° } 
Grade Reading Reading Arithmetic | Spelling Complete 
composition rate | | achievement 
jae Pa —— —— ——- ‘ 
ee 0.36 0.23 | O85 | O11 | 0.43 
OE | 0.41 0.56 | O25 | 0.87 | ~ O51 
re 0.69 0.69 | 0.30 | 0.45 | 0.67 
| | 











While the increase is fairly large and quite uniform, no good reasons 
appear in our data to account for it. 

In a preceding section it was found that the more non-verbal tests 
(Myers and Dearborn) showed a decrease in correlations with MA 
and achievement as the grade became higher, and in other sections it 
was found that the more verbal the material, the higher the correlation 
with attainment. 

The verbal group tests are the same from Grade IV up and the 
correlations with achievement are about the same. Since the same 
criterion is used with the Stanford, we should look to the Stanford 
test itself for an explanation of the rise as the grade becomes higher. 
The suggestion is that the tests in the Stanford scale become more 
verbal as the MA becomes higher. To infer this from our data would 
be risky since unsuspected factors may enter in (for example, the older 
the child the more time usually required). The reader, reviewing the 
Stanford scale, can judge for himself. 

The grade differences are important, if real. It will be worth 
while to devote the next section to a comparison in which the results 
for Grades I, II and III are included. 


Part IV. CoMPARISON OF RESULTS FOR DIFFERENT GRADES 


Little weight can be given to a comparison of the results of one 
grade with those of another, especially where Grades I, II, and III are 
concerned. The measures of attainment, for one thing, are much 
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less reliable in the lower grades; the range of ability is larger in the 
lower grades since careful grading is begun with Grade IV, and finally 
the content of the Intelligence Tests for the primary grades is different 
from that of the upper grades. This is true of the Stanford as well as 
the Group Tests. The results are given in Table XII. 





TABLE XII 
1 2 3 
Grade Achievement Achievement Achievement 
with Mental Age| with Verbal with Non-verbal 








I 0.36 0.30 

II 0.44 ver 0.23 
III 0.47 0.65 0.22 
IV 0.42 0.54 0.22 
V 0.51 0.49 0.17 
VI 0.67 0.57 0.29 
Vil or 0.52 0.08 
VIII ene 0.47 —0.15 














In case of the correlations of achievements with MA, the highest 
grades show the highest coefficients, as was pointed out in the preced- 
ing section. That Grades II and III show a larger correlation than IV 
is probably due in part to the fact that the range of abilities is greater 
in the former. The range is great in Grade I also, but the validity of 
the measures of achievement is small, with a resulting attenuation. 

Aside from the high correlation for Grade III for reasons just 
mentioned, the correlations of Verbal Tests and Achievement are 
about the same for all grades. The non-verbal materials show a very 
low correlation in Grades VII and VIII but we are not certain, by 
any means, that the data represent the state of affairs for non-verbal 
materials in general, since but one Non-verbal Test (Myers) has been 
used in Grades IV to VIII. The Dearborn Tests, which contain 
both verbal and non-verbal materials, show a similar but less pro- 
nounced tendency. 

For purposes of comparing one variable with another in the same 
grade, our data are valid. The Stanford MA and the Verbal Tests 
(where they are used) are clearly superior to the non-verbal. In fact, 
it was consistently found that the non-verbal materials added but little 
when the independent weights were found by the regression coeffiecient 
and by multiple correlation. 
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The Verbal Tests seem to yield higher correlations with achieve- 
ment in Grades III and IV than does the MA; the coefficients are 
about the same for Grade V but thereafter the MA gives clearly a 
higher coefficient with achievement. 

So far we have found two factors which influence the correlations 
with achievement: (1) the more verbal the material in the test the 
higher the correlation and (2) the longer the test, the higher the 
correlation, other things being as equal as we could make them. In 
the Verbal Group Tests both time and verbalness are equal for the 
Grades III to VIII. For the Stanford Test, the older the child men- 


tally, the greater the time required. 'Terman’s estimates of the times 
are:! 


Children 6-8 yearsold......................05. 30-40 minutes 
Se Mc ccc asa abeaeressseneis 40-50 minutes 
Cieiiavem 13-15 years Gd... .. 0... 0. ccc ccc cc cenee 50-60 minutes 
ie oi cad SL a cng. aedicly Mik iba Sel ihe Wa a 6 60-90 minutes 


It is our impression that the tests become more verbal also in the 
higher levels. If so these two factors would account for the increasing 
coefficients. 

Other explanations may be offered, for example, the higher levels 
may yield results of higher reliability. The evidence, however, is 
against this supposition.” The matter of reliability or constancy must 
not be confused with that of validity. It may be that the materials 
in the upper areas of the Terman are more valid, when constancy, time, 
verbalness, etc. are equal. We have no data on this point except the 
general finding that greater verbalness has meant greater validity 
when school attainment is the criterion. Another possibility is that 
the Stanford Test is really equally valid all the way, but that the 
correlation becomes higher as the pupils become more proficient— 
as they hit their stride—in the upper grades. The Group Tests, it 
might be argued, give equally high correlations all along, because they 
include a greater amount of reading and arithmetic and are largely 
measuring achievement directly. On this point we have some data. 
It was found, for example, that in Grade III where the pupils were 
rather inefficient in reading and writing at the beginning of the year, 
the class fell far below the norms for their age and grade in the Na- 


1 Terman, L. M.: ‘‘The Measurement of Intelligence,” p. 127. 


2 Rugg, Harold and Colloton, Cecile: Journal of Educational Psychology, 
September, 1921, p. 319. 
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tional Intelligence Tests while their mean IQs, in the Stanford was 
about the same as that found for other grades—as we were expecting. 
The correlations of National Scores and Achievement, however, were 
as high as those found in other grades. These findings indicate that 
the Stanford Test is much less subject to the influence of school train- 
ing and, if so, its general usefulness would be very much greater. It 
is our hope to check up this matter by comparing achievement in this 


and succeeding years, with the results of the various tests given in 
1920. 


GENERAL SUMMARY AND CONCLUSIONS 


1. Other things being equal, the more verbal the material, the 
higher the correlation with school attainment. 

(A) In Grades I and II, the Non-verbal Tests gave low correla- 
tions with Achievement (0.30 and 0.23, respectively) compared to 
0.36 and 0.44 respectively, between the Stanford-Binet and Achieve- 
ment, which is more verbal. 

(B) In grade III, a group of Non-verbal Tests gave a mean corre- 
lation of 0.22 with achievement as compared to 0.65, the mean correla- 
tion of a group of Verbal Tests with Achievement. In this grade, 
the Non-verbal Tests required a longer average time than the Verbal. 

(C) The only wholly Non-verbal Test (Myers) used in grades IV 
to VIII, gave much lower correlations than Verbal tests. The Dear- 
born Test, combining both materials, gave a higher correlation than 
the Myers, but a lower correlation than the mean Verbal Group Tests. 

(D) When the materials of all tests (Grades IV to VIII) were 
arranged on a scale from the least to the most verbal and broken into 
four steps, each representing one hour teaching time, it was found that 
the more verbal the material the higher the correlation with the 
composite of achievement. 

(EZ) When the individual Group Tests were arranged for the 
degree of verbalness, time being eliminated by the technique of 
partial correlations, the independent correlation (Partial r first order) 
with Achievement was 0.69. 

2. Verbalness being equal, the greater the length of the tests, the 
higher the correlation with achievement. 

(A) For Grades I and II, all tests being non-verbal, the mean cor- 
relation between length of test and magnitude of the mean correla- 
tions with all criteria was 0.69, when the SDs are made equal by use of 
the Rank method of correlation. Allowing the SDs to remain as they 
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are (Product Movement formula) the correlation is 0.49. In this 
case, the SDs for length (time) of tests is very large compared to the 
SDs for the r’s. 

(B) In Grade III, the Products Movement correlation of length 
and magnitude of r’s with achievement is 0.76 for non-verbal tests, 
and 0.81 for the verbal tests. 

(C) In the upper grades, the degree of verbalness varies so much 
that comparisons could be made only by the use of partial correlations. 
The partial correlation between achievement and time (verbalness 
constant) was 0.21. 

3. The degree of verbalness out-weights the lengths of the test 
as a factor determining the correlations with achievement. 

(A) Using the combined data for Grades IV to VIII, the following 
weights were obtained by the regression equation: 

1. Weight of verbalness, 1.00 
2. Weight (B) of time, 0.224 

(B) Combining time and verbalness perfectly by the weights 
given above, a multiple correlation with achievement of 0.725 is 
obtained as compared to a partial correlation of 0.69 which verbalness 
alone yields, or 0.21 which time alone yields. 

(C) The Stanford-Binet yields higher correlations in the upper 
grades than in the lower grades (results up to Grade VI only available). 
This increase is probably—but not certainly—due to (1) increasing 
verbalness of material in upper levels, and (2) increase in the time spent 
in the test. 

4. When either the Stanford Test, or a verbal group test has been 
given, the independent contribution of the other, obtained by the 
regression equation, multiple or partial correlation, is not very great 
but probably important. 

(A) In Grade III, the mean verbal test gives a correlation 
with achievement of 0.65. The addition of Stanford MA, perfectly 
weighted, raises the correlation (multiple r) to 0.699. 

(B) In Grades IV, V, VI, taking mean results, the Stanford MA 
gives a correlation with achievement of 0.54. Adding the independent 
elements of the mean verbal group test, the multiple r becomes 0.605. 

5. A measure of “School Attitude” obtained by judgments of 
teachers yields an average correlation of 0.32 with achievement, but 
this factor, in so far as it contributes to school success, is almost wholly 
included in the measures given by a combination of the Stanford-Binet 
and an average Group Test. 
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For example: 


Simple r, Achievement with Stanford MA = 0.54 

Multiple R, Achievement with (MA + Group Test) = 0.605. 

Multiple R, Achievement with (MA + Group Test X School 
Attitude) = 0.611. 


6. The Stanford Test and the Verbal Group Tests yield very nearly 
the same correlations with particular school subjects, the former cor- 
relating relatively high with arithmetic, the latter with reading and 
spelling. 

(A) Moderately verbal material yields higher correlations with 
arithmetic than extremely verbal, but neither gives a satisfactory 
correlation. Extremely verbal yields higher correlations with Read- 
ing Comprehension, Reading Rate, Spelling and the Stanford-Binet. 

7. The inter-correlations of school subjects are not high with 
the exception of Reading Comprehension with Reading Rate, which is 
0.85. 

(A) This fact suggests the need of specific tests for native aptitude 
for each subject. 

(B) It raises a question with regard to the validity of the concept 
of the ‘‘ Accomplishment Quotient”’ and similar practices based on the 
assumption of slight specialization. 

(C) It suggests the need of correlating tests with abilities developed 
to the limit, rather than with abilities which are developed more or less 
according to the practices of the particular school. 
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CORRELATIONS BETWEEN BINET TESTS AND 
GROUP TESTS | 


W. T. ROOT 
University of Pittsburgh 


In the fall of 1920 the author supervised the testing of some 600 
children in the schools of Monessen, Pa., with the Binet-Simon Tests, 
Stanford Revision. Early in 1921 it became possible, with the aid 
of Mr. Herman Gress! and Mr. Wade Blackburn,' to give a battery of 
mass tests to the same group. They consisted of the following: Otis 
Primary A (O. P. A); Otis Advance A (O. A. A); Haggerty Sigma I 
(H. Sigma I); Haggerty Delta I and II (H. Delta I and II); National 
A. land B. I (N. A. land N. B. I); Terman Group A (T.G. A); Menti- 
meter, (M.); Dearborn Series I and Series II (D. 8. I and D. S. II); 
and Illinois I and II (Ill. I and II. II). 

The usual statistical precautions were observed. All scores and 
correlations were rechecked. The greatest precautions were taken 
to secure uniform conditions throughout the testing. Only the splen- 
did cooperation of the Monessen teachers made this possible. The 
mass tests were given weekly, on the same day at the same hour. 
The Dearborn Series I was given in two sittings. About 416 from the 
Grade I to Grade XII were given both the mass and the Binet tests. 
In correlating, Grades XI and XII are combined. Pearson’s Product- 
Moment formula was used in correlating. . 

It has been assumed, in making the correlations, that the Binet 
Tests constitute the truest estimate of intelligence in so far as tests go. 
This may not always be the case with older (college) students as 
indicated in a recent article by De Camp,? but probably no one will 
take exception to the assumption that with children up to 15 or 16 
years of age, the Binet Test constitutes the best single test estimate of 
the intelligence that can be made. Granting this, the correlation of 
any mass test with the Binet Test becomes of immense importance in 
estimating the value of the former. The administrator is anxious to 
know what mass test is most suitable for a particular grade or a partic- 





1 Mr. Gress is Superintendent of Schools, Monessen, and Mr. Blackburn is 
Supervisor of the grammar grades. They hope later to present a careful analysis 
of causes of variation in the correlations, and also an analysis of the causes lead- 
ing to marked individual inconsistency in performance from test to test. 

2 De Camp, J. E.: Studies in Mental Tests. School and Society, Vol. XIV, 
pp. 253-258. 
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TABULATION OF CORRELATIONS 
Grade Number R P. E.R 
SE ee ne” oem ee 1 87 0.72 0.03 
ee uke Cha ee ea kee 2 34 0.60 0.07 
EET NE Re a a ee nr 3 36 0.63 0.07 
es eh eh enekeneadw ee 4 38 0.77 0.04 
cn fe he ee annie awed all 198 0.80 0.02 
ee eee 5 26 0.64 0.08 
iM. 0 cok peheneneee etek seuewe 6 32 0.46 0.09 
rr aha ae oh ond eke obs «we 7 31 0.76 0.05 
i Ms ok ek yg bb ge wakes ead e 8 45 0.68 0.05 
Me a ee a lie brag acelin eae wee 9 22 0.72 0.07 
OE re ee ree 10 25 0.55 0.09 
I oS aig a gle Shae ke 0a aiS 11-12 37 0.44 0.09 
Nd wisn wee ow Renee es eeen all 218 0.80 0.02 
TS, ee er re 1 88 0.47 0.06 
re eo cccediecseteansdaeweeese 2 36 0.46 0.09 
RRS ee ee re 3 36 0.61 0.07 
 . } 9 » Se ee all 160 0.74 0.02 
a ae Od et ie a aae 1 88 0.71 0.04 
i en teh ad bekd eked eens eae 2 36 0.28 0.10 
i i ns Cc ahah ee ee Kawa w en 3 37 0.57 0.07 
ee ee 8 a aw ae oe a aed ce ace | all 162 0.76 0.02 
SA. cnc cb centiviwecsedeou | 3 36 0.62 0.07 
OS SB es ae | 4 40 0.69 0.06 
ne, ce hwb bu eee eee ewes 5 25 0.58 0.09 
i Ok Se eaw wes ew bee we ele 6 32 0.60 0.08 
rr rr ee take edeeesadeuasees 7 31 0.82 0.04 
rr re i dig ww wb «we ewe eae s 44 0.79 0.06 
rr ee edegetenenhedenss 9 22 0.44 0.12 
ee ee, ec nbs + 6G hie damaeels ene all 232 0.84 0.01 
I A a) re Site ale Mahia ae wae | 3 36 0.69 0.06 
I ee i A ica this inline Gi th | 4 41 0.68 0.06 
Ne a diene ate Wea sec 5 26 0.66 0.07 
ee a eld owen eee 6 32 0.72 0.06 
a os ke ee kia wlace in oka ae 7 31 0.79 0.05 
Ms id ia chain «uh aed ale NR 6 0s | 8 45 0.51 0.03 
I I i reed eee meee 6 es | all 211 0.84 0.01 
ES OE ES a a eee ee 3 35 0.67 0.06 
ee ak cia i win wie be 4 41 0.65 0.06 
i ea ed haga 5 26 0.69 0.07 
ines with M. OLE..........ccccccecccccccccees | 6 32 0.63 0.07 
i ee ee 7 31 0.67 0.07 
ee a cea euieebe ~ 45 0.49 0.08 
a ea es Soa ean natal | all 210 0.86 0.01 
Binet with T. G. A.. a 7 31 0.73 0.06 
ck eee Me ea ee eee | 8 45 0.65 0.06 
oc aaa cua ks ba oo 5a we AWD ee 9 22 0.35 0.13 
RT so cigcn aa swe eek ewe oe oun 10 25 0.67 0.07 
es wie oa bs a odeew es aes 11-12 37 0.53 0.08 
i es ines iweb ce ooh deeb ed 606 all 160 0.75 0.02 
hs oe idk nicig aa eRe eee we 1 86 0.65 0.04 
kg a sa. eal eal delaras aim ale oS cae 2 35 0.49 0.09 
iro FS ah sd anda Wiese ae 3 36 0.60 0.07 
ee ee el os Beas Una oe mdae oes 4 39 0.68 0.06 
eile ata whbaaiink ad w noe eles a ae 5 26 0.71 0.07 
I a ae a Ale a da a ei aa rg 6 32 0.53 0.09 
ee ig eu eee eaeeee e 7 31 0.71 0.06 
ie eens el eels Ba ow as s 45 0.61 0.06 
0S he ooo ete Seika da ie 9 22 0.43 0.12 
eS ar te erie a rl 10 25 0.68 0.07 
EE ee a re eee 11-12 36 0.54 0.08 
Fs oe ake peta we berale wane wae all 407 0.88 0.01 
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TABULATION OF CORRELATIONS (Continued) 
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ular group of grades. It is hoped the accompanying correlation 
tabulations are a step in this direction. 

The following observations may be made directly from the tabula- 
tion: 

1. Grade I. D. 8S. I correlates the highest with the Binet (0.79); 
O. P. A next (0.72); and H. Sigma I, last (0.46). (The Sigma test 
it will be recalled is a reading test and not strictly an intelligence test.) 
The D. 8. I would seem then to be the most suitable for this grade 


but has the disadvantage of requiring 2 days to give, owing to its. 


extreme length. 

2. Grade II. Judged by their correlations with the Binet Test, 
none of the mass tests proved satisfactory with the Grade II. The 
O. P. A has the highest correlation (0.59); the M., next (0.49); H. 
Sigma I, next (0.45); the D. 8. I, next (0.40); and the H. Delta I, 
last (0.28). 

3. Grade III. All of the mass tests are here more satisfactory, 
giving a correlation coefficient within the neighborhood of 0.60. The 
D. S. I is apparently most suitable (0.71); with the N. A. I a close 
second (0.68). The H. Delta I is least suitable (0.57). It will be 
noted that H. Delta II yields a higher coefficient (0.62) than H. 
Delta I. 

4. Grade IV. The O. P. A, (0.77), is decidedly higher than the 
next most suitable test, the H. Delta II (0.69). The remaining tests, 
it will be noted, all lie within the 0.60s. 

5. Grade V. The Ill. I gives the highest coefficient (0.75), with 
the M. a little below, (0.71). The remainder of the tests lie within 
the 0.60s except the H. Delta II whose coefficient falls to 0.58. 

6. Grade VI. There is a marked difference in the correlations for 
this grade. The D.S. II has a coefficient of 0.74 with the N. A. I next 
(0.72), while the O. A. A falls lowest (0.46). 

7. Grade VII. The highest correlation is with the H. Delta II 
(0.82), with the N. A. I a little lower (0.79). All of the correlations 
are high for this grade, lying within the 0.70s, with the exception of 
N. B. I (0.67). 

8. Grade VIII. The correlations for this grade cover a wide range. 
H. Delta I being highest (0.79); while N. A. I (0.51) and N. B. I (0.49) 
are the lowest. The National correlates highly with the Binet except 
for this one grade. 

9. Grade IX. The correlations are here low, and the small number 
of cases make the P. E.s high. The O. A. A stands highest (0.72); 
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then a drop to a correlation of 0.47 with D.S. II. The T. G. A stands 
last (0.35). 

10. Grade X. Only three of the mass tests given cover this grade. 
The Mentimeter stands highest (0.68); the T. G. A a close second 
(0.67); and O. A. A, decidedly lower (0.54). 

11. Grades XI and XII. The M. and T. G. A each give a correla- 
tion of 0.53. The O. A. A falls to 0.43. None of the tests are as 
satisfactory as with the lower grades. 

12. Grades I to IV inclusive. Considering uniformity of high 
correlation the O. P. A seems best for these grades. It should be 
noted though that the Otis falls low on the Grades I and III. No 
test is entirely satisfactory. 

13. Grades III to VI inclusive. It is sometimes desirable to con- 
sider these grades together. D.S. I and II make the highest and most 
uniform correlations; while the N. A. I and N. B. I make a close 
second. 

14. Grades V to VIII inclusive. For these grades D. S. II is 
most desirable with H. Delta II of nearly equal value. 

15. Grades VII and VIII. If these two grades are grouped 
together, H. Delta II is seemingly far superior. 

16. Grades VII to IX inclusive. With the increase of junior high 
schools this grouping is now frequent. The O. A. Ais apparently best. 

17. Grades IX to XII inclusive. Grouping these grades, the O. A. 
A is perhaps most satisfactory. 

18. Grades VII to XII inclusive. Considering these five grades 
together, there is little choice between O. A. A, T.G. A, and M. The 
author favors the T. G. A because it is very easy to administer, 
requires but 35 minutes to give, and is the simplest to score. 

19. It will be noted that when the grades are pooled and correlated 
with the Binet, a high correlation for “‘all”’ in the tabulation is not 
indicative for any particular grade. 

20. Grades I to XII inclusive. The highest general tendency to 
correlation with the Binet is with the Mentimeter (0.88). The Dearborn 
test is a close second (0.87). As these two tests cover somewhat differ- 
ent abilities (M. placing a premium on language ability, and D. on 
non-language ability) the writer suggests this combination as being the 
best, if two tests can be given to the entire 12 grades. The writer has 
been able to get more from these two series (Mentimeter and Dearborn) 
when he is desirous of making individual analysis than with any other 
combination of two mass tests. 
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However, if only one mass test can be given, the varied character 
of the Otis test makes it more valuable in analysis than either the 
Dearborn or the Mentimeter alone. 

The Dearborn proved difficult to give and needs shortening and 
simplifying, but when this is done the author feels that this will prove 
to be one of our very best tests. All of the difficulties could be easily 
rectified. At present it is not easy for the average teacher to give, and 
errors in scoring are much more frequent than with other mass tests. 
Even as it stands, it is certainly a superior test with certain foreign 
children who have not yet mastered the English idiom—and this 
failure of mastery (with a foreign language in the home) is a bigger 
problem than is usually realized by those giving mass tests. 

21. The correlations between the various mass tests are higher and 
more uniform than between the mass tests and the Binet Series. The 
following correlations are conspicuously high: 


IN. A. T with Ih. Bs By ee BAG... noc cc ceccecwees 0.94 
a 8 er ee ee 0.93 
Bas eee Gs Bs i I BO. 0k dies cadicccociscacss OF 
eS fF 8 eee er oe eer ee ere 0.92 


CAUSES FOR SIGNIFICANT VARIATION IN CORRELATIONS 


The following are probably the chief causes for variation from test 
to test, and from grade to grade, in the correlations presented here: 

1. Probably the greatest single factor is the difference in weight 
that different mass tests attach to different abilities. If a few rough 
captions are made, such as linguistic ability, arithmetical ability, etc., 
and the percentage of value attached to each caption in the various tests 
listed, it will be found that a marked difference exists in the relative 
value attached to any caption as we go from one mass test to another. 
It would often seem that when the maker of the test had 10 arithmetic 
problems that ability got 10 points in score; if he had 5 puzzles, puzzles 
scored 5; and if he happened to have on hand 20 completion sentences, 
completion of sentences got 20 points to the score. Be that as it may, 
it is certain that chance, rather than any knowledge of the relative merits 
of different elements in the intelligence-complex or compound, deter- 
mines the proportion of any particular kind of psychological or pedi- 
gogical test. The difference in proportion is on the whole more notice- 
able and probably more significant as the cause of variation in score 


ae. bats’ ri Ay 


ee ce Te % 


so YE sac, 
_ ae se 


_ 
~ ye 
te ® 


on 


pp OOS 





“Aer Ss RELI 










-~ Se ne ae 


a 
Set 





pS UE = 
_— Sas no 
ae 


gm ——. ht a on meee Ta ee -_— 
eS al i omens 


SES I 
ee ne 


t 


~ i 


see SESS, 


#} 
14 
+ 

vo 
| 
UF 
+ 
Ml 
4 








ee 


oe ee. 


7 — < 








292 The Journal of Educational Psychology 


from test to test than differences in the kind of test used by different 
mass-test compilers referring, of course, to the omnibus type of psycho- 
logical test. 

It will also be found that not only does a difference exist in the 
weighting of the test as a whole, but taking a certain region of the test 
likely to be answered, say by a Grade VI pupil, one mass test will differ 
radically from another both in the weight attached to different psy- 
chological captions and in the captions themselves. 

2. As indicated in the line above, the mass tests differ from one 
another not only in the weighting of various captions but also in the 
actual captions included in the omnibus, or in the region of a particular 
grade. 

3. H. Sigma correlates relatively poorly with the Binet, probably 
because it is essentially a reading test, and also because it demands a 
certain degree of reading ability. In cases where the child could read 
the line from the test but had attention riveted on the mechanics of 
reading, no action followed. Another cause of failure to respond to 
the test seemed to be an aversion to making a mark on the printed page. 

4. It is conceivable that certain local grade conditions can play an 
important part; methods of teaching, predominance of certain foreign 
elements of a particular race, stress on certain school subjects, etc. 

5. The Binet Test is largely independent of the element of time; 
mass tests must of necessity rest on a time basis. We do not know to 
what extent different subjects are benefited in one case and injured 
in the other, or vice versa. 

6. Finally, marked change in the rank-order of an individual from 
one mass test to another or from mass test to the Binet may rest upon 
various individual differences. An analysis of such cases with a close 
study of the causes operating in an individual case, is a much needed 
task but beyond the scope of this preliminary report. 




















A METHOD OF INFERRING THE CHANGE IN A COEF- 
FICIENT OF CORRELATION RESULTING FROM 
A CHANGE IN THE HETEROGENEITY 
OF THE GROUP 


ARTHUR 5S. OTIS 
Yonkers-on-Hudson, N. Y. 


Let us suppose we know the correlation between two variables, 
x and y (as for example, the scores in Forms A and B of a group test 
of mental ability), calculated from data derived from a group of a cer- 
tain heterogeneity as, for example, the pupils of a single grade, and let 
us suppose it is desired to know what would be the coefficient of 
correlation between the same variables in the case of a group of differ- 
ent heterogeneity as, for example, the pupils of several grades combined. 
The method of determining the influence of the change in heterogeneity 
of the group is as follows: | 

Let rz, equal the coefficient of correlation between z and y in the 
first instance. 

Let r’zy equal the coefficient of correlation between z and y in the 
second instance. 

Let o, equal the standard deviation of the y values in the first 
instance. 

Let oy, equal the standard deviation of the y values in the second 
instance. 

To find r’,, from rzy, oy, and a,, solve the formula: 


, oy 
ns =1-—(1- ‘sv, 
o'y 


The derivation of this formula is as follows: 
By the Otis difference formula! for correlation, 


a. 
— 20°, (1) 
in which 
d=y- ey. (2) 


1 This formula was first proposed by the writer in an article entitled The 
Reliability of Spelling Scales, Involving a Deviation Formula for Correlation, 
School and Society, Oct. 28 to Nov. 18, 1916. It was later called the “difference 
formula” and the derivation shown in The Reliability of the Binet Scale and 
Pedagogical Scales, Journal of Educational Research, September, 1921. So 
far as the writer is aware, this formula had not been proposed by any other writer. 
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The quantity d is therefore the vertical distance of any point (z, y) 
in the correlation plot, from the line of relation the equation of which is 


y= “4¥, Itisthe difference, in units of the y scale, between the values 


Oz 
of x and y for that particular case when the value of z has been trans- 
muted into terms of y. The value of d in our suppositional case, 
therefore, is a measure of the amount of discrepancy between the two 
scores of a single individual, measured in terms of the y scale. 

Now there may be, of course, a noticeable tendency for the dis- 
crepancy between the scores in the two forms of any test to vary with 
the magnitude of the scores. For example, if there were a tendency 
for the scores in the two forms to deviate less in the lower ranges, 
this fact would be evidenced by a pear-shaped appearance of the 
scatter diagram. But if the scatter diagram has a full elliptical 
appearance, and thus gives no suggestion of any tendency for the scores 
of an individual in the two forms to deviate more in one part of the 
scale than in another then it would be fair to assume that the tendency 
to deviation was the same throughout the whole range. In this case 
the value of og would tend to be constant, and we could assume it to 
be constant, for all degrees of heterogeneity. 

In that case, 

2 
Mey = 1— 55 (3) 
in which ¢?, is the same as in equation (1). 

We are now in a position to derive r’ from r, knowing co, and oy). 

Solving equation (1) for 22, we have 





o74 = 2(1 — riy)o’y (4) 
Therefore 
12(1 — rzy)o? 
i + y/O"y 
r2a=1 2 a2,’ (5) 
whence 
, oy 
ag @=i-U- av) 53 (6) 
or | 
sy = 07y — oy + Try oy 


It must be remembered that this method does not apply to irregu- 
lar scatter diagrams. 














HOW THE DEARBORN INTELLIGENCE 
EXAMINATION STANDARDS WERE 
OBTAINED 


WALTER F. DEARBORN 
AND 
EDWARD A. LINCOLN 


Psycho-educational Clinic, Harvard University 


The customary way of standardizing any test is to give it to as 
many children as possible, and combine the results to get norms which 
are somewhat impressive because of the large numbers upon which 
they are based. The theory of this procedure seems to be sound, but 
in practice it has given rise to some serious difficulties. Complaints 
have been heard that the results of some of the tests place whole 
classes very much too high or too low, and that the rankings obtained 
on two or more tests are sometimes widely different. 

In the hope of obviating some of these difficulties a new method 
was tried in the standardizing of the Dearborn tests. The Series II 
examinations were given in three towns in every grade from the second 
through the senior class in the high school. It is hard to say just 
what a typical American town is, but the towns selected do not seem 
specialized in any way. In each of them agriculture is carried on to 
considerable extent, but each also does considerable manufacturing. 
They are large enough to support fairly large numbers of small busi- 
ness men, and are near enough to Boston so that the large city is a 
fairly open field for the inhabitants. There is in each town a fair 
sprinkling of children of foreign parents. 

The scores were not lumped, but the results from each community 
were treated separately. They were distributed by months, so that 
it was possible to find not only the median score for the pupils of each 
year, but the median age as well. It has heretofore been the assump- 
tion that the children of a certain age have a median exactly at the half 
year, that is, for example, the children from 13.0 to 13.99 years old 
have a median of 13.5 years. This supposition was found to be incor- 
rect in relation to the pupils studied for these standards. In one 
community the median 11 year old was only 11.33, and there were 
many smaller variations. 

When the median scores and ages were obtained they were plotted 
as in the accompanying diagram. On this diagram points were 
chosen at each half year for standards. These were taken with the 
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attempt to make such standards that the median child of each age 
should have an intelligence quotient within the normal group (0.90 
to 1.10) no matter which community he wasin. This criterion is very 
admirably fulfilled. In the twelfth year, where the discrepancies are 
the greatest, the median child in the lowest scoring group has an 
IQ of 0.93, and the median child in the highest scoring group has 1.04 
foranIQ. The other deviations from 1.00 are much smaller. 

It is very likely that classes, schools, and possibly one or two school 
systems will be found in which the distribution of intelligence quoti- 
ents will be rather decidedly skewed in one way or the other. 

It is believed, however, that in most of these cases the reason for 
the skew will be apparent. The authors have found for example, that 
in the foreign section of a city where the adults are engaged mostly in 
unskilled or semi-skilled labor the intelligence quotients on both group 
and individual examinations are likely to run low. It frequently 
appears, as may be seen on the accompanying diagram, that the 
pupils of a certain age or a certain grade are out of line with what 
seems to be the general tendency of the pupils in the community. 

Series I was standardized in the same way as Series II, although 
it was not practicable to get results from so many upper grade children, 
and thus the standards from the twelfth year on had to be estimated 
somewhat from the continuation of the lines at their upper ends. 

This method is especially valuable in that it exposes facts which are 
concealed when results are thrown together, and thus more intelligent 
treatment of the data is possible. 
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THE SIGNIFICANCE OF ALPHA IN COLLEGES 


CHARLES LEONARD STONE 
Dartmouth College 


Despite the fact that the Alpha examination was designed for 
purposes very unlike academic functions, much interesting material 
has been gathered from colleges and universities in the past two years. 
This statistical study, summarizing the results of the Alpha test given 
at Dartmouth College to 633 freshmen in the Fall of 1919 and to 622 
freshmen in the Fall of 1920, is in answer to six important questions 
asked by college administrations: (1) How does the intelligence test 
correlate with total scholarship? (2) Is there a prognostic indication 
in the case of men separated, men on probation, and men with superior 
scholarship records? (3) How does the test correlate with individual 
subjects? (4) Is there an increasing superiority shown by the test 
scores as we ascend from E to A men in each subject? (5) What 
percentage of exceptions is there at the A and E ends of the scholarship 
scale? (6) Have the individual tests of the group examination any 
diagnostic significance with relation to specific subjects of study? 


I 


With the class of 1924 a definite endeavor was made, by administra- 
tion influence, by an article in the college paper, and in general by 
campus tradition, to have all men take the test seriously. It seems 
safe to assume that this effort was in significant degree responsible for 
the higher scores, and very possibly for higher correlations, as com- 
pared with the class of 1923. In the class of 1923 the correlation with 
first semester grades was 0.438 + 0.022; with second semester grades 
0.333 + 0.026; in the class of 1924 the first semester correlation with 
total scholarship was 0.498 + 0.021. 


II 


When presented in terms of averages there is not a wide range 
between the men who attain a general average of B or better (173.0) 
and those who are separated from college for scholarship reasons 
(139.1). But when the distribution of these groups of scholarship is 
shown in terms of the intelligence quarters, the predictive significance 
of the test seems more hopeful. 


Of the 24 men lowest in the Alpha test in the fall of 1919 (below 
298 
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Separated, | Probation, | B or better, 
per cent per cent per cent 
8 ds nn eked os Hee 2.3 13.7 | 55.6 
i kkinka cen ceeek news 18.2 19.6 26.7 
ices olin bean deb sku 18.2 23.5 11.1 
SS os alee owes u dee ce-as 61.4 43.1 6.7 











110) twelve were eliminated within a year, whereas only five of the 
highest 102 (above 169) were so disposed of. 


III 


The following tabulation shows in the first column of figures the 
correlation of Alpha scores with first semester performance in the 
various freshman subjects, and in the second column the correlation 
of total scholarship with the individual subjects: 


Eo 0.443+0.163 0.882+0.046 
en Sd I 0.294+0.047 0.842+0.015 
I cnc rae cp es cuteles ae die 0.497+0.021 0.712+0.014 
a rice et ca he we ote 0.304+0.030 0.739+0.015 
NS hia ks Pah ile vce eeu es 0.119+0.043 0.693+0.022 
NS Didiiiea hh Wig tas Dawe gd 0.363+0.055 0.766+0.026 
Mathematics..................... 0.3879+0.026 0.753+0.016 
I ak a alee eielg dele! oan Aiba 0.444+0.053 0.707+0.032 
ESS a ee eee 0.306+0.040 0.768+0.017 
I 0k d wa eRd oink ska ewes 0.220+0.045 0.736+0.021 
FRE ee ee ar 0.111+0.083 0.548+0.058 
Ns 6 (ieee OE EEE 0.313+0.031 0.730+0.016 
Physical education................ 0.198+0.026 0.541+0.019 


Some of these Alpha correlations are fairly indicative. Total 
scholarship has much higher correlations; but of course each subject 
is a considerable ingredient of total scholarship, and total scholarship 
in the first semester has undoubtedly lower correlations with specific 
subjects in later college years. 

We are not so vitally interested in high correlation through the 
middle ranges of intelligence and scholarship, however. The signifi- 
cance of the correlation at the extremes may well be brought out by 
noting what per cent of the men of each scholarship grade are found 
in“the highest and lowest quarters of intelligence and of scholarship. 
The following data concern such relationships in English (highest 
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correlation with Alpha), Graphics (lowest correlation with Alpha), 
and the general average of all subjects. 


In HIGHEST QUARTER OF INTELLIGENCE 



























































A B | @¢ D E 

Per cent | Percent | Percent | Percent | Per cent 
RS 5% a Ses vee 47.1 48 .6 24.0 10.8 2.1 
Graphios............ 42.9 20.0 24.1 16.7 50.0 
ee 48.2 39.3 23.4 17.6 11.4 

In HIGHEST QUARTER OF SCHOLARSHIP 
_ ee 

NS i atnen deni 70.6 62.7 | 18.1 | 1.6 0.0 
eee rere 71.4 27.3 17.2 | 0.0 0.0 
Average............. 86.6 56.4 | 18.8 | 4.0 1.1 
a In Lowzst QUARTER oF INTELLIGENCE ines 
Steet ecaaedans : ponent ———— bade 
ER ake ie amb etic | 0.0 6.5 21.4 | 41.7 | 68.8 
RN asi coe: nies 0.0 40.0 24.1 16.7 | 0.0 
Average.............| 9.0 14.0 21.8 | 33.1 | 44.0 

7 In LOWEST QUARTER OF SCHOLARSHIP _ — 
II os ous ees 0.0 1.5 17.8 | 49.2 87.5 
Css ccceveuees 0.0 22.7 17.2 66.7 100.0 
Average.............] 1.0 | 3.3 146 | 44.2 74.8 














It is only fair to note that the data on English include 594 cases, that 
of Graphics only 64. | 


IV 


As observed before, the averages do not represent the divergences 
of ability very markedly, but the following data show, on the whole, 


some superiority of each scholarship grade over the scholarship grade 
just below. 


A B C D E 
NG iis ilk alin wm aie 164.3 161.7 149.9 138.9 125.4 
ere ee 161.0 143.3 147.6 148.5 154.5 . 
Total average.............. 161.0 156.4 149.4 148.7 1388.1 
Vv ; 


The exceptional cases are presented in terms of the per cent of A 
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men below the Alpha average of the class and of the E men above that 
average. 


EG inch 4k camer ewe 10.7 percent A men 14.7 per cent E men 
i on an aed ea ewes 0.0 per cent A men 100.0 per cent E men 
I bibiss a Wee ko week eas 18.2 per cent A men 32.8 per cent FE men 


One conspicuous objective at present should be the elimination 
or explanation of cases of extreme disparity between intelligence and 
scholarship. Probably much of this disparity can be attributed to 
differing degrees of motivation. One very satisfactory way to detect 
idlers, men too heavily loaded with extra-curricular activities, and 
men with unusual capacity to develop their potential, is, if we may at 
least tentatively trust intelligence tests, to compare intelligence 
percentiles with scholarship percentiles of the individual men. With 
such accessory data as we may get from case studies and instinct tests, 
a modified intelligence test will probably be one of the most valuable 
instruments in scientific educational administration. 


VI 


The very nature of Alpha, and the inclusion of certain tests of 
little significance—at least in their present form and degree of difficulty 
—make the diagnostic value of Alpha very dubious. Inthe rough, the 
language group (Greek, Latin, English, French, Spanish, and German) 
seems to stand out from the science group (mathematics, physics, 
chemistry, biology, and graphics). The data are presented in per- 
centiles of the class of 1923. 





Th ee 1/213 144/15 161417 1 8 | Total 
































| : 
Language A men.......| 56.6) 70.6) 64.1; 76.1) 61.8) 65.1) 63.5) 62.4) 69.0 
Science A men......... 71.4) 82.7) 64.1) 68.9) 66.8) 76.1) 73.9) 67.0) 76.1 
Language FE men....... 56.6) 54.4) 49.5) 32.8) 31.0) 48.0) 32.1) 35.8) 30.3 
Science E men......... 56.6) 51.5) 53.8) '40.3) 31.0) 41.3) 29.8 38 .0| 31.0 
Language range........ 0.0) 16.2) 14.6) 43.3) 30.8) 17.1) 31.4 wal 38.7 
Science range........... 18.4) 31.2 10.3 28.6) 35.8) 34.8) 43.9 29.0 45.1 





Instances in specific studies, however, invalidate any definite 
deductions to be derived. The number completion series, which 
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seems slightly diagnostic of sciences, appears to be nearly as good a 
prognosticator of Latin ability as total Alpha. The directions test 
seems equally valid (or invalid) in Latin and graphics. The disar- 
ranged sentence test would seem significant in English, French, and 
German, but of neutral value in Spanish. 

Some interesting facts emerge from this statistical accumulus. 
The combination of the arithmetic problems and number completion 
series tests correlates higher with mathematics (0.315 + 0.027) than 
either test separately (0.272 + 0.028 and 0.307 + 0.027); but this 
specific combination does not correlate so high with mathematics as 
total Alpha (0.379 + 0.026). On the other hand, the combination 
of the synonym-antonym and disarranged sentence tests correlates 
as well as—but no higher than—total Alpha with English ability 
(0.497 + 0.021 in both cases). 

All in all, the present Alpha would, from the standpoint of elective 
advisory purposes, seem to be as random an agent as the traditional 
campus method of selecting courses. But there are, nevertheless, 
positive tendencies which encourage the hope that a series of 10 or 
12 tests may yet be evolved which will be of signal predictive value 
in elective advisory problems, and at the same time a partial index of 
prevocational aptitudes. 
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CHEMISTRY AND CHARACTER 
THOMAS W. GALLOWAY 


American Social Hygiene Association 


Investigation has gone far enough to convince us that life itself, 
as well as the various phases and shadings of life which appear as 
particular functions, qualities, tendencies, and states, is largely influ- 
enced by chemical substances produced in the very act of living. For 
example an active living body quickly produces enough GO,, first to 
accelerate respiration and in a few movements, if it is not eliminated 
from the system, to destroy life. The nitrogenous products of living, 
if not eliminated, will do the same in the course of afew days. Indeed 
in such a complex body as ours, we clearly have a condition in which every 
cell in the body pours into the blood substances which may be taken up 
and may modify the functions of every other cell in the body. Inthe 
evolution of this mutual adjustment of diverse tissues and their prod- 
ucts there has arisen something like a dozen special groups of cells 
(ductless glands) whose secretions into the blood (hormones or endo- 
crines) are known to have a special and profound influence in keeping 
up that balance which we call life and normality. 

The researches in this most interesting field have reached the acute 
ink-spilling stage, and endocrines seem likely not merely to determine 
the fate of the individual but to activate the “fourth estate” as well. 

In reviewing such books as these there are two equally tempting 
openings. (1) The essential biological, chemical, experimental, and 
therapeutic matter, which is exceedingly interesting; and (2) the impli- 
cations of these for personal education and character. This review will 
be confined chiefly to the latter adventure,' although to do so is least 
fair to the authors, inasmuch as it is naturally in just this field that 
their work is most hypothetical and least satisfactory. 

Bandler deals with the subject as a gynecologist, and hence empha- 
sises the role of the internal secretions in connection with the phenom- 
ena of sex and reproduction, particularly in the female. The latter 
part of the book, however, discusses the instincts and emotions, 
mental and nervous defects, psychoses, phobias, etc. in terms of the 
quality and quantity of the secretions. While enthusiastic, the book 
is in the main reasonable. 





1 Bandler, S. W., M. D.: “The Endocrines.” Philadelphia; W. B. Saunders 
Co. 
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Berman! is interesting, vivid, suggestive, picturesque, and erratic. 
His style—which includes the incorporation in one saturated solution 
(chemical!) conclusions based on experiment, on speculation, on 
momentum, and on temperament—is brilliant; one feels at times, 
unnecessarily so. A not unfair illustration of this is: ‘‘For, since 
menstruation is so closely connected with the phases of the moon and 
the tides, the rhythmicity of the posterior pituitary body may be 
traced to the days when the pineal was an eye in the top of the head, 
and in direct relation with the pituitary.” 

The main objection to such a mixture is not so much that imagina- 
tion is introduced in such liberal proportions. This is quite legiti- 
mate; science needs imagination. Indeed, except in respect to its 
applications to the crass material necessities of life, one would perhaps 
better have imagination without facts than facts without imagination, 
if one must be deprived of either. Nevertheless when they are mixed, 
it is rather important both for the mixer and reader to know when and 
where, and the proportions. An adequate index to this is the great 
need of the book. 

One has the feeling that the author himself plays a bit fast and 
loose with the implication of his thesis. At one place he refers to the 
“bubble of education,” in which he is logical. And yet he recognizes 
the revolutionary character of ‘‘ psychic conversions”’ (where there is 
no evidence of endocrine causation), in which he is right rather than 
logical. 

It seems to the reviewer very well established (1) that there are 
life and death values for human beings in the endocrines; (2) that they 
modify growth and normality in many particulars; (3) that the inher- 
ited or acquired predominance, or the under-secretion of certain of 
the glands is an important factor in determining classes or types of 
individuals, physically and temperamentally (as, for example, the 
fact that the secretions of the germ cells, coupled at one time or an- 
other with certain others, make all the differences between males and 
females); (4) that excessive or deficient secretions can, in some cases 
at least, be corrected artificially, thus changing profoundly the natural 
personal states; (5) that these influences do extend to, and produce 
variations in, many at least of those personal qualities which collec- 
tively we describe as character or personality. 


1 Berman, Louis, M. D.: “The Glands Regulating Personality.”” New York: 
1921, The Macmillan Co. 
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In a practical way, the knowledge of the endocrines will surely 
enable us (1) to correct many gross defects of development and func- 
tioning in matters that are basic to character; (2) to secure a better 
general balance of the unconscious and autonomic coordinations; (3) 
to diagnose native trends and types of personal balance, and by means 
of this to guide the individual into most suitable work and adjust- 
ments. In other words, it may well supplement our neuro-muscular 
and intelligence tests for vocational or other guidance. It is possible, 
too, that such knowledge may ultimately give us some power to in- 
crease the strength of particular traits of character, though it is at 
present far from evident that the endocrines are in this degree and 
sense ‘“‘specifics.”’ 

In the opinion of the reviewer the structural and dynamic psycholo- 
gist still has adequate biological grounds on which to posit the ordinary 
educational procedures based upon the central nervous system and 
its connections. Some of the grounds for this belief are: (1) There 
seems to be no adequate evidence that the endocrine systems or even 
the supposedly omnipotent ‘‘standards of the intra-visceral pressures 
of the vegetative system” antedate or dominate the functions of the 
central nervous system either in the evolution of organisms or in the 
development of the individual; (2) there is on the contrary abundant 
evidence that even the local nervous ganglia which now control these 
vegetative functions are made up of cells which have migrated from 
this central system; (3) these glands (and hence their secretions) are 
not the cause of the earliest differentiations which lay the foundation 
of individual development, but are rather the much later product 
of these differentiations. In other words the matter of inheritance is 
certainly chemical as well as physical in character, but cannot be in 
any strict sense endocrine—any more than it is “nervous’’—in its 
primary nature. (4) Both nerves and endocrines are belated individual 
specializations; the endocrines—blood reactions are, with possibly 
one or two exceptions, entirely too slow of operation, to account for 
the rapid rise of the primary emotional states which accompany the 
sensori-motor responses of life; and hence (5) the education and con- 
trols of the individual by any form of activity and experience is still 
probably to be considered first and fundamentally a direct nervous 
(psychological) process, only secondarily modified, in some now un- 
known degree, by the resulting endocrine changes, which come largely 
as by-products of the nervous situation. 

In estimating then the practical bearing of endocrines upon the 
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actual education of personality the reviewer believes that modern 
endocrinology is to the older theories of the bodily ‘‘humours”’ as 
modern cerebral localization is to phrenology; and that for the practical 
development of fairly normal people the probability is that “chemical 
localization”’ will be just about as fruitful as cerebral. The book is 
greatly worth a reading on the part of the discriminating educator. 








ADDITIONAL RETESTS BY MEANS OF THE STAN- 
FORD REVISION OF THE BINET-SIMON TESTS 


S. C. GARRISON 
George Peabody College for Teachers 


This article reports 468 retests by means of the Stanford revision 
of the Binet-Simon tests on 170 children. Of these retests, 43 were 
secured at an interval of 4 years, 127 at an interval of 2 years, and 
298 at an interval of 1 year. In School and Society, June 4, 1921 the 
writer reported retests on 62 children at an interval of 3 years. The 
retests reported in that article are not included in the data presented 
here. Forty-three children upon whom retests were reported in that 
article are still in school and have been retested this year. The 
material secured in that retesting is included here. Goddard’s revision 
(1911) was used originally in testing the 43 children. All other tests 
and retests were made with the Stanford revision. 

The material presented here was secured by testing as follows: 
94 cases in 1917-1918; 161 cases in 1919-1920; 157 casesin 1920-1921; 
and 149 cases in 1921-1922. It will be seen that the same retest is 
counted several times in the total of 468. For example, if a pupil was 
tested in 1917-1918, in 1919-1920, in 1920-1921, and in 1921-1922; 
we have one retest at an interval of 4 years, one at an interval of 2 
years, and two at an interval of 1 year. We also have one retest at an 
interval of 3 years but that was included in the material reported as 
mentioned above. The testing was done as follows: The writer gave 
all the tests in 1917-1918; 138 in 1919-1920; 51 in 1920-1921; and 89 
in 1921-1922. Nine advanced graduate students in educational 
psychology did the other testing. The students doing the testing in 
1920-1921 had done little previous testing. They had, however, 
studied the test very thoroughly and had observed while the instructor 
and others gave the test. They then gave under supervision a number 
of tests. The students who did the work in 1919-1920 and 1921—1922 
had all done previous testing and their work was carefully checked 
before they did any of the testing reported in this paper. All had had 
extensive preparation in psychology. It should also be stated that 
the person doing the retesting was ignorant of the results of the previ- 
ous test. No comparisons between results were made until the testing 
program was completed. 

The frequency of each age and the average difference between the 
results of the tests given at an interval of 1 year are shown in Table 
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I. The age given in the table is in every case that of the child at the 
second testing. There were 127 children who took the test 3 years 
in succession. For each of these children there are two retests and their 
ages are counted twice. The age at each retesting is listed. 

In tabulating the ages we have listed as 12 all pupils who have 
passed their twelfth birthday but who have not yet reached their 
thirteenth. 

A study of the table shows that there is some variability with 
respect to the average difference at the various ages. The larger 
average differences for the fifteenth and sixteenth year groups are 
probably due to the fact that there were several pupils who got practi- 
cally all the tests right at both the first and second testing. These 
pupils would have scored higher had there been more advanced tests. 
We do not really know what the true IQ is for several of these pupils. 
Seven of the 11 pupils 8 years of age made large gains. These were 
all in the same grade and under the same teacher. This teacher with- 
out any previous training or practice undertook to give the Binet-Simon 
test to most of the children of the grade. We discovered that this 
had been done or was being done while we were retesting these pupils. 
We felt at the time that our results for this grade were influenced by 
that factor. If the IQs for these pupils are not included, we find an 
average difference for the remaining 4 of 4.2. 


TABLE I.—SHOWING THE FREQUENCY OF Eacu AGE AND THE AVERAGE DIFFER- 
ENCE BETWEEN THE Two Sets or IQs ror THE ONE YEAR INTERVALS 

















Age | Frequency | Average difference 

16 9 | 7.2 
15 39 | 6.3 
14 46 | 5.5 
13 41 | 4.5 
12 44 | 5.1 
11 40 | 4.8 
10 39 | 4.7 
9 | 39 4.8 
5.9 


8 | 11 





The differences between the IQs secured in the several testings are 
shown in Table II. Slightly more than 50 per cent of the differences 
for the l-year-interval data lie between —2 and +4 inclusive. For 
the 2- and 4-year-interval data these limits are —3 and +4, and —3 
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and +5 respectively. Of the 468 retests, 40 (or 8.5 per cent) show a 
difference of more than 10. Ejighty-nine per cent shows a difference 
of 8 or less. The table shows that there is a gain in 55 per cent of the 
retests and a loss in 38 per cent. The same IQ was found in 7 per cent 
of the cases. 


TaBLeE II.—Snow1ne DISTRIBUTION OF DIFFERENCES IN IQs BETWEEN TESTS 


























. Frequency Frequency Frequency 
Differences l-year interval 2-year interval 4-year interval 

15 5 6 
14 2 1 1 
13 3 
12 2 2 l 
11 3 5 
10 2 3 1 
9 3 4 2 
8 7 2 3 
7 12 6 2 
6 16 1 3 
5 20 | 5 3 
4 23 | 10 4 
3 27 | 8 3 
2 22 13 2 
1 17 | 5 1 
0 22 | 9 

—1 19 | 7 4 

— 2 19 | s 4 

—3 | 14 | 6 3 

4 | 18 | 6 1 

—5 | 11 | 4 

— 6 7 5 1 

— 7 11 4 1 

— 8 6 2 

-— 9 2 

—10 2 1 

~l | - 2 | 1 

—12 | 1 | | 

—13 2 | 1 | 

—14 | | | 

—15 | | 1 | 1 





Tables IIIa, IIIb, and IIIc give the average gain or loss and the 
number whose IQ was larger or smaller or remained the same at the 
second testing. The data are tabulated according to degree of bright- 
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TaBLeE II]a.—SHOWING THE AVERAGE GAIN OR LOSS AND THE NUMBER GAINING 
oR LosING OR REMAINING THE SAME FOR THE 1-YEAR-INTERVAL Data, 
WHEN CLASSIFIED ACCORDING TO DEGREE OF BRIGHTNESS 








Intelligence Average Average Number Number Number 
quotient gain loss gaining losing the same 
120+ 1.5 37 21 3 
110-119 1.6 56 32 5 
100-109 0.9 - 47 32 8 

90- 99 0.3 17 16 4 
— 89 1.4 8 11 2 




















TaB_LeE III]b.—SHOWING THE AVERAGE GAIN OR LOSS AND THE NUMBER GAINING 
or LosInc oR REMAINING THE SAME FOR THE 2-YEAR-INTERVAL DATA, 
WHEN CLASSIFIED ACCORDING TO DEGREE OF BRIGHTNESS 








Intelligence | Average Average 
quotient | gain loss 
| 
| 
120+ 3.6 
110-119 1.2 
100-109 0.8 
90— 99 0.3 A ab 
— 89 1.3 











Number 
gaining 
13 
24 
20 
9 
5 








Number Number 
losing the same 
7 2 
12 2 
12 2 
8 0 
8 3 








TaBLeE IIIc.—SHOWING THE AVERAGE GAIN OR LOSS AND THE NUMBER GAINING 
or LosInGc oR REMAINING THE SAME FOR THE 4-YEAR-INTERVAL Data, 
WHEN CLASSIFIED ACCORDING TO DEGREE OF BRIGHTNESS 





| 























Intelligence | Average Average Number Number | Number 
quotient | gain loss gaining losing the same 
120+ 4.0 7 3 0 
110-119 5.1 8 3 0 
100-109 2.0 . 6 | 5 | oO 
90- 99 0.7 3 | 4 | 1 

- 89 3.3 2 | 1 | 0 
ness. It will be seen that a large proportion of the children test 


rather high. As a matter of fact the median IQ for the 170 children 
is 112. The high selection shown here is accounted for as follows: (1) 
The school is located in one of the best residential sections of the city 
and is patronized very largely by people engaged in the professions. 
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(2) A good tuition fee is charged. (3) Parents of children who do 
poor work are asked to withdraw their children from the school. 

Several things seem to be indicated by the tables. A good majority 
of the children do better in the second test than they did in the first. 
There is a gain in 55 per cent of the cases. When the children are 
classified according to degree of brightness, the higher classes seem to 
gain more on the average than the lower. The lower classes seem to 
remain about the same or possibly lose a small amount. Our retests 
are too few in number to draw any definite conclusions on this point 
however. If we omit one record in our lowest group we have a small 
average gain showing instead of a loss in both Tables IIIb and IIIc. 
The large average gain shown in the higher classes in Table IIIc is 
doubtless due in part to the fact that Goddard’s revision was used in 
the first testing (1917-1918). Since there does seem to be a slight 
gain in the higher classes, it is evident that there is a slight practise 
effect, that the test is relatively easier in the higher ages, or that the 
1Q actually increases for the higher classes. We feel that there are 
not enough data available yet to warrant definite conclusions. 


TaBLE I[V.—SHOWING THE RESULTS OF 468 RETESTS 





-80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155+ 

2 
| 3 
: 2 


+155 
150 
145 
140 
135 
130 2 
125 | 12 & 
120 2, 2) 11 6 9 
115 7; 11] 33 21 7 
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1 For a summary of the data reported see Rugg and Colloton, Constancy of the 
Stanford-Binet IQ as Shown by Retests. Journal of Educational Psychology, 
September, 1921. 
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We have planned a retesting program and hope in a few years to 
have data which will throw light on questions raised here and else- 
where. At present we have three records on our third grade children, 
two on the second grade, and one on the first grade. That material 
is not included in this report. It is our intention to retest these chil- 
dren and the children in the following grades at an interval of one year 
until the present third grade finishes the eighth grade. 

Our data are reported in Table IV. In this table the IQs from 
123 to 127 inclusive are listed as 125. We found a coefficient of corre- 
lation between the tests at a 1-year interval of 0.88, at a 2-year interval 
of 0.91, and at a 4-year interval of 0.83. 











NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 


mi OTHER MAGAZINES >in 


REPORTED BY CECILE COLLOTON 
Department of Educational Psychology, The Lincoln School of Teachers College 











INTELLIGENCE TESTS 


A Brief History of Mental Tests. Andrew T. Wylie. Teachers College 
Record, 1922, January, 19-33. A very brief summary of the history and devel- 
opment of intelligence and educational tests. Some of the most important tests 
are listed with the names of the authors and the dates of publication. 

Tests for Ability before College Entrance. J.B. Johnston. School and Society, 
1922, Apr. 1, 345-353. Report of a study conducted at the University of Minne- 
sota to determine the predictive value of entrance ratings of four types, (1) rank 
in high school classes, (2) advanced studies in high school, (3) marks on English 
themes, (4) score on intelligence tests. Discussion of the effect of extra curricular 
activities on scholarship in college. 

Some Uses for Intelligence Tests. Samuel 8S. Brooks. Journal of Educational 
Research, 1922, March, 217-238. Eighth article on “Putting Standardized 
Tests to Practical Use in Rural Schools.” Grading pupils by means of group 
intelligence tests supplemented by the Binet-Simon. 

A Comparative Study of Four Group Scales for the Primary Grades. VY. A. C. 
Henmon and Ruth Streitz. Journal of Educational Research, 1922, March, 
185-194. Pressey’s Primer Scale, Myers’ Mental Measure, Dearborn’s Group 
Test Series 1, and Haggerty’s Delta 1 compared as to correlation with teachers’ 
estimates, discriminative capacity, and conformity to natural distribution curve. 
Pressey, Dearborn and Haggerty of practically equal value. Pressey and Haggerty 
easier to administer and score. One hundred pupils in first and second grade 
classes tested. 

The Validity of the Whipple Group Test in the Fourth and Fifth Grades. Helen 
Davis. Journal of Educational Research, 1922, March, 239-244. The effect- 
iveness of the Whipple Group Tests in selecting pupils from the 4th and 5th grades 
for ‘speed classes’’ at Jackson, Michigan. 

Does Intelligence Tell in First-grade Reading? W. W. Theisen. Elementary 
School Journal, 1922, March, 530-534. A study of three groups of primary 
pupils classified on the basis of intelligence by means of the Pressey Primer Scale. 
Progress of the groups in reading measured by the Haggerty Reading Test. 
Advantages of grouping entering pupils. 

The Intelligence Testing Program of the Detroit Public Schools. Warren K. 
Layton. School and Society, 1922, Apr. 1, 368-372. A detailed description of 
the work of the Psychological Clinic of the Department of Special Education, 
Detroit Public Schools. The tests used; when given; uses of test results; etc. 
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The Value of Intelligence Tests in Universities. J. W. Bridges. School and 
Society, 1922, March 18, 295-302. Weaknesses of intelligence tests in colleges and 
universities as shown by data secured by questionnaire from 42 universities. 

The South Dakota Group Intelligence Test for High Schools. W. H. Batson. 
School and Society, 1922, March 18, 311-315. Results of a battery of six tests 
designed especially for high schools and administered to 1453 students in 27 
schools. 

A Clinical Survey of a First Grade. Gladys G. Ide. The Psychological Clinic, 
1922, January-February, 274-287. Examination of 400 first grade children by 
educational, psychological, and physical tests. Results of tests and reeommenda- 
tions on basis of results. 

The Relative Efficiencies of Distributed and Concentrated Study in Memorizing. 
Edward S. Robinson. Journal of Experimental Psychology, 1921, October, 
327-343. Two experiments conducted with students in Yale University to study 
various factors in the two methods of memorizing and to determine the relative 
merits of each. Bibliography. 


CasE STuDIES 


Four Cases of Diagnostic Teaching. Gladys Poole. The Psychological Clinic, 
1922, January-February, 225-229. Four case histories of children studied in the 
Psychological Clinic at the University of Pennsylvania. Diagnosis made on the 
basis of the child’s response to teaching. 

A Case of Special Difficulty with Reading. Bernice Leland. The Psychological 
Clinic, 1922, January-February, 238-244. Detailed history of a child’s difficulty 
in reading and the remedial measures used. 

Five Cases in Vocational Guidance. Rebecca E. Leaming. The Psychological 
Clinic, 1922, January-February, 245-255. Five case histories showing the prob- 
lems met by a counselor in Junior Employment Service. 

Diagnostic Problems in Educational Guidance at the Observation School, Uni- 
versity of Pennsylvania, Summer of 1920. Gladys G. Ide. The Psychological 
Clinic, 1922, January-February, 265-273. Case studies of children in summer 
school. Need for a curriculum adapted to the “over-aged, the dull, the physically 
defective.” 

The Relation of the Conduct Difficulties of a Group of Public School Boys to their 
Mental Status and Home Environment. Eleanor Hope Johnson. Journal of 
Delinquency, 1921, November, 549-574. Report in detail of a study of 52 boys 
reported as ‘‘ problems in conduct.”’ 

Near-Delinquents in the Public Schools. Mary Bess Henry. Journal of Delin- 
quency, 1921, November, 529-548. Case histories of 50 children who present 
serious problems in the schools. 


MISCELLANEOUS 


The ‘‘ Double Track”’ System in a Small School. C. W.Odell. The Elementary 
School Journal, 1922, March; 544-546.: Description of a flexible plan of school 
progress in use in a typical consolidated township school. Adaptation of state 
courses of study to two sections. Section A completes course in 7 years.—Section 
B in 8 years. 

Some Data on Anatomical Age and Its Relation to Intelligence. Frances Lowell 
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and Herbert Woodrow. Pedagogical Seminary, 1922, March, 1-15. A study 
of the carpal development of 402 Minneapolis and St. Paul school children with 
reference to sex, chronological age, and number of permanent teeth. Comparison 
of carpal development with mental age as determined by the Kuhlman 1917 
Revision of the Binet Test. 

Child Labor and Mental Age. Raymond G. Fuller. The Pedagogical Semi- 
nary, 1922, March, 64-71. A plea for the adaptation of the school system pri- 
marily to the needs of the 85 per cent now supposedly incapable of profiting by 
staying in school until] they are 16. 

Educational Measurement as a Key to Individual Instruction and Promotions. 
Carleton W. Washburne. Journal of Educational Research, 1922, March, 195- 
206. Three necessary steps in placing a school system on an individual basis: 
(1) establishment of subject matter units; (2) preparation of tests completely 
covering each subject matter unit; (3) preparation of self-corrective practice 
materials. Description of work in the public schools of Winnetka, Illinois. 
Illustrative tests and “goals.” 

Short Scales for Measuring Habits of Good Citizenship. Clara Chassell, Siegried 
Upton and Laura Chassell. Teachers College Record, 1922, January, 52-59. 
Eight short scales for measuring the habits and attitudes of good citizenship are 
described and their derivation and construction explained. Suggestions for 
various uses of the results and advantages and disadvantages of the scales are 
given in detail. 

The Description of the Performances of Pupils on Exercises of Varying Difficulty. 
Walter S. Monroe. School and Society, 1922, March 25, 341-343. Studies of 
various tests show close correlation between weighted and unweighted scores. 
Number of exercises done correctly practically as good a description of a pupil’s 
performance as a weighted score. 

Sectioning Classes on the Basis of Ability. C. E. Seashore. School and 
Society, 1922, April 1, 353-358. Description of a plan for sectioning college 
classes in fundamental courses on the basis of ability to progress, as shown by a 
competitive test at the beginning of the course. Advantages of the plan and 
possible objections to it are summarized. 

Failures Due to Language Difficulty. Cornelia Mann. The Psychological 
Clinic, 1922, January-February, 230-237. Significant differences in results of 
testing two kindergarten groups with the Stanford Binet. Children from homes 
where no English is spoken at decided disadvantage in test and in first grade work . 

The Effects of Practice upon the Scores and Predictive Value of the Alpha Intelli- 
gence Examination. Florence Richardson and Edward S. Robinson. Journal of 
Experimental Psychology, 1921, August, 300-317. Report of an experiment in 
administering the Alpha test to college students on three successive days. Scores 
on second performance probably the most reliable. Reasons for improvement. 


SERS Be S- ial 
—_—eo orn 





Sea oe 


eee 


+e 


oe 


———— ss a 


st 
. * 


oe 


a 


nn ee 


~~ 
em EC ~ 


—— + al > 
a © 


Ep orer 


—— 





oh 
» D 











NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


Sl / EDUCATION Sim 











1. A Mental Survey of High School Seniors— The idea of a mental 
survey of any large group of children is relatively new, but the increas- 
ing number and efficiency of group intelligence tests will naturally 
result in many surveys in the near future. The plea for such surveys 
made by the reviewer in 1918 is already bearing fruit, and they are 
being conducted more thoroughly and efficiently than he would have 
imagined possible at that time. As the significance of this sort of 
work becomes apparent to educators and sociologists, it will certainly 
lead to a great increase in the number of such surveys, because an 
inventory of the raw human material concerned is a necessity for a 
correct appreciation of every educational, social, and industrial prob- 
lem. Professor Book has taken a horizontal section of the human 
material of the State of Indiana. The section he has chosen is narrow 
and very limited, but it is at the same time extremely important. _In- 
telligence tests were given to 6188 senior high school students! and 
the results may, therefore, be considered representative of the mental 
caliber of senior high school students in Indiana. Only some of the 
most significant results can be mentioned in this review. The tre- 
mendous differences in intelligence found in different schools and in 
different communities is again emphasized, and the wide range of intel- 
ligence of the whole group serves again to call attention to the need for 
readjustment of the curriculum to the different mental levels of the 
pupils. Most significant for the college and the university is the fact 
that about as many students of inferior or mediocre intelligence are 
planning to go to college as students of superior intelligence. If the 
universities in a democracy are intended to attract and educate the 
youth of superior intelligence, they are failing in the sense that a large 
percentage of such individuals are not even planning to attend. 
Furthermore, the high schools themselves do not seem to be at all 
successful in their handling of the superior mental material, as illus- 


1 Book, W. F.: “‘The Intelligence of High School Seniors.” Macmillan, 
1922. 
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trated by the percentage of superior students that is retarded or held 
for the conventional four year course. The author rightly emphasizes 
again and again the waste in superior ability that the survey reveals. 
The failure of our educational system properly to make use of superior 
ability in the elementary school, the high school and the college, is 
gradually being revealed by intelligence tests. The relation of 
intelligence to the vocational choice of the pupils shows the need for 
vocational advice and guidance. Incidentally it should be of interest 
to the profession of medicine to notice the relatively low standing of 
the students who are planning to study medicine. The survey shows 
that the manufacturing districts of the state contribute a larger 
percentage of superior students than do the agricultural districts. 
The agricultural districts contribute a much larger percentage of 
inferior students. All districts and all economic classes and all types 
of schools, however, possess children of all grades of intelligence, 
although of course in different amounts. A slight sex difference in 
favor of the boys is shown, and this combined with the fact that the 
girls are more successful in their school work makes the author raise 
the question as to whether the high school is not less well adapted to 
boys than to girls. 

The need for methods of evaluating school achievement in terms 
of mental ability is stressed by the author, and it is surprising to the 
reviewer that he has not pointed out the different ways that have 
already been suggested and tried out by other workers. There are 
many other important and valuable results in the book which cannot 
be mentioned in this review. It is a book that we can strongly recom- 
mend to all high school teachers and principals and it has a distinct 
lesson for the educator, psychologist and sociologist. 

The American high school is not truly democratic, because it fails 
to allow for differences in intelligence, and only by so doing can it give 
to each full opportunity to develop to the utmost his individual 
capacities. R. P. 





2. A New Book on the Psychology of Effective Study—The author 
of this new book for teachers in training! has discussed the significance 
of training for effective study with fine insight into the underlying 
psychological principles. His definition of effective study sets stan- 


1 Thomas, Frank W.: “Training for Effective Study.’ Boston: Houghton- 
Mifflin Co., 1922, pp. XVIII + 251. 
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dards which even experienced teachers will do well to recognize. He 
would have teachers trained to direct pupils in the acquisition of study 
habits and procedures and to develop in them the ability to think and 
plan toward the solution of specific problems, to adopt purposes and 
assume the responsibility for carrying them out. He would have 
teachers recognize the psychology of the instincts and the fundamental 
considerations underlying any training which is to result in self-direc- 
tion and the acquisition of socially desirable habits and tendencies. 
He criticises traditional practice, pointing out weaknesses and making 
constructive suggestions for improvement. He does this with con- 
crete illustrations which lend clarity to the discussion, and facilitates 
study as he conceives it. This, together with the summaries and ques- 
tions for study at the end of each chapter, recommend the book for 
class room use in normal schools and other teacher training institutions. 
L. Z. 





3. A Practical Volume Based on Scientific Reading The conclu- 
sions of numerous scientific studies of reading should modify current 
practice much more than they have. The new volume on Silent and 
Oral Reading by Clarence R. Stone! will certainly facilitate the adop- 
tion of scientific methods of instruction in reading. It brings together 
and interprets the results of psychological and educational research and 
supplies concrete and practical suggestions covering a wide range of 
teaching needs. 

The organization of the content and the full index make it easy for 
teachers to use the book in the solution of specific problems. After 
a summary of the present situation and the outlook in Chapter I the 
contributions of research are discussed in the succeeding chapter. 
There follows a chapter on reading in the primary grades and another 
on the intermediate and upper grades. Four chapters are then de- 
voted to specific problems and suggestions based on research and experi- 
mentation. Chapter IX contains a critical discussion of available 
reading tests and their use. The final chapter deals with individual 
differences and special individual and group instruction. Each chap- 
ter is followed by a group of practical problems for study and discussion. 
The bibliography is very brief and does not include all the references 
used in the text. 


1 Stone, Clarence R.: ‘‘Silent and Oral Reading.’”’ Boston: Houghton-Mifflin 
Co., 1922, pp. XVIII + 306. ; 
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We agree with Dr. Cubberley, the editor, who says in this introduc- 
tion: ‘‘The contents of this volume ought to be the common prop- 
erty of all elementary-school principals and supervisory school officers 
who have supervisory oversight of elementary-school work, and be used 
by them as a basis for their supervision of the elementary-school work 
in reading. It ought also to be used by students in normal schools and 
teacher-training institutions in connection with the work in teaching 
methods and training-school practice. It would also form a very 
profitable study for teachers in service in connection with reading- 
circle study. Its simple style, absence of technical procedure, and 
very practical application to school room procedure all combine to 
make it an unusually useful book for the class room teacher to read 
and to follow.” L. Z. 











